Sweetviz

Visualization Using Sweetviz Library in Python¶

Sweetviz is a Python library for visualizing and analyzing datasets. It is an open-source library that generates beautiful and informative visualizations and reports with just a few lines of code.

The system is built around quickly visualizing target values and comparing datasets. Its goal is to help quick analysis of target characteristics, training vs testing data, and other such data characterization tasks.

Some key features of Sweetviz include:

Quick generation of EDA (Exploratory Data Analysis) reports with a single line of code
Ability to compare two datasets side-by-side and identify differences between them
Detailed analysis of the distribution of values for each feature, including histograms, density plots, and box plots
Identification of missing values and their distribution in the dataset
Correlation analysis between features, including correlation matrices, scatter plots, and heat maps
Support for both numerical and categorical features
Export of reports to HTML files for easy sharing and presentation

Installation¶

pip install sweetviz

Importing Libraries¶

In [12]:

import pandas as pd
import sweetviz 
from IPython.display import display

In [13]:

path = 'C:/Fimran/Work/Research/HR/'

Data Description¶

The company had 1470 employees over the last year. The HR team has collected and consolidated information through various sources(like survey,appraisal report,salary structure,etc)about each employee.

In [14]:

data = pd.read_csv(path + 'HR_Data.csv',index_col = 0)

In [15]:

data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1470 entries, 0 to 1469
Data columns (total 15 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   EmpId               1470 non-null   int64  
 1   Attrition           1470 non-null   object 
 2   Department          1470 non-null   object 
 3   Age                 1470 non-null   int64  
 4   EducationField      1470 non-null   object 
 5   Gender              1470 non-null   object 
 6   MaritalStatus       1470 non-null   object 
 7   HourlyRate          1468 non-null   float64
 8   MonthlyIncome       1470 non-null   object 
 9   PercentSalaryHike   1470 non-null   int64  
 10  JobRole             1470 non-null   object 
 11  JobSatisfaction     1470 non-null   int64  
 12  PerformanceRating   1470 non-null   int64  
 13  TotalWorkingYears   1470 non-null   int64  
 14  YearsInCurrentRole  1470 non-null   int64  
dtypes: float64(1), int64(7), object(7)
memory usage: 183.8+ KB

Let’s analyze our dataset using Sweetviz Library :¶

In [16]:

report = sweetviz.analyze(data)

                                             |          | [  0%]   00:01 -> (? left)

Whenever you click on a particular feature, it will show full detail about that feature.¶

Here are some of the things that a Sweetviz report can visualize:

Data summary: Sweetviz provides a comprehensive summary of your data, including the number of rows and columns, data types, missing values, and unique values.

Distribution: Sweetviz creates histograms, kernel density plots, and box plots to show the distribution of each variable in your dataset.

Correlation: Sweetviz creates scatter plots, correlation matrices, and heatmaps to show the correlations between variables in your dataset.

Comparison: Sweetviz can compare two datasets and show the differences between them, highlighting variables that have different distributions, correlations, or missing values.

Target analysis: Sweetviz can analyze the relationship between variables and a target variable, creating box plots, violin plots, and histograms to show how the variables differ for different target values.

Association: Sweetviz can analyze the association between variables, creating visualizations such as mosaic plots, stacked bars, and heatmaps to show the strength and direction of the association.

In [17]:

report.show_notebook(layout = 'vertical')

Conclusion :¶

Overall, Sweetviz reports provide a rich set of visualizations that can help you quickly identify patterns, anomalies, and relationships in your data, making it a valuable tool for exploratory data analysis.

In [ ]: