violinplot

Makes a violin plot of all numerical features against a specified categorical target column.

Description

A violin plot plays a similar role as a box and whisker plot. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared. Unlike a box plot, in which all of the plot components correspond to actual data points, the violin plot features a kernel density estimation of the underlying distribution.

violinplot(data=None,
           num_features=None, 
           target=None,
           fig_size=(5,5),
           save_fig=False):
    '''
    Parameters
    ------------
        data : DataFrame, array, or list of arrays.
            Dataset for plotting.
            
        num_features: Scalar, array, or list. 
            The numerical features in the dataset, if not None, 
            we try to infer the numerical columns from the dataframe.
        
        target: array, pandas series, list.
            A categorical target column. Maximun number of categories is 10 and minimum is 1.
       
         fig_size: tuple, Default (8,8)
            The size of the figure object.
       
         save_fig: bool, Default False.
            If True, saves the current plot to the current working directory
   '''

Examples

We are using the classic iris data set and a Jupyter notebook in the following examples.

Violinplots can be created for every column in a DataFrame and separated by a specified target:

import pandas as pd
import datasist.visualizations as vs

df = pd.read_csv('iris.csv')
vs.violinplot(data=df, target='species')

violinplot can be created for specified columns only:

vs.violinplot(data=df,num_features=['petal_width'],target='species')

The size of plots can changed using the fig_size parameter

vs.violinplot(data=df,
              num_features=['petal_width'],
              fig_size=(3,3),
              target='species')

To save a figure to the current working directory, set the save_fig parameter to True:

vs.violinplot(data=df,
              num_features=['petal_width'],
              target='species', 
              save_fig=True)

To improve this documentation, visit the datasist-doc repository

Last updated