Sweetviz is a Python library for visualizing and analyzing datasets. It is an open-source library that generates beautiful and informative visualizations and reports with just a few lines of code.
The system is built around quickly visualizing target values and comparing datasets. Its goal is to help quick analysis of target characteristics, training vs testing data, and other such data characterization tasks.
Some key features of Sweetviz include:
pip install sweetviz
import pandas as pd
import sweetviz
from IPython.display import display
path = 'C:/Fimran/Work/Research/HR/'
The company had 1470 employees over the last year. The HR team has collected and consolidated information through various sources(like survey,appraisal report,salary structure,etc)about each employee.
data = pd.read_csv(path + 'HR_Data.csv',index_col = 0)
data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1470 entries, 0 to 1469 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 EmpId 1470 non-null int64 1 Attrition 1470 non-null object 2 Department 1470 non-null object 3 Age 1470 non-null int64 4 EducationField 1470 non-null object 5 Gender 1470 non-null object 6 MaritalStatus 1470 non-null object 7 HourlyRate 1468 non-null float64 8 MonthlyIncome 1470 non-null object 9 PercentSalaryHike 1470 non-null int64 10 JobRole 1470 non-null object 11 JobSatisfaction 1470 non-null int64 12 PerformanceRating 1470 non-null int64 13 TotalWorkingYears 1470 non-null int64 14 YearsInCurrentRole 1470 non-null int64 dtypes: float64(1), int64(7), object(7) memory usage: 183.8+ KB
report = sweetviz.analyze(data)
| | [ 0%] 00:01 -> (? left)
Here are some of the things that a Sweetviz report can visualize:
Data summary: Sweetviz provides a comprehensive summary of your data, including the number of rows and columns, data types, missing values, and unique values.
Distribution: Sweetviz creates histograms, kernel density plots, and box plots to show the distribution of each variable in your dataset.
Correlation: Sweetviz creates scatter plots, correlation matrices, and heatmaps to show the correlations between variables in your dataset.
Comparison: Sweetviz can compare two datasets and show the differences between them, highlighting variables that have different distributions, correlations, or missing values.
Target analysis: Sweetviz can analyze the relationship between variables and a target variable, creating box plots, violin plots, and histograms to show how the variables differ for different target values.
Association: Sweetviz can analyze the association between variables, creating visualizations such as mosaic plots, stacked bars, and heatmaps to show the strength and direction of the association.
report.show_notebook(layout = 'vertical')