from plotnine import *
import pandas as pd
Exploring Python’s Grammar of Graphics with Plotnine
Data analysis and exploration rely heavily on visualization to uncover insights, patterns, and trends within raw data. Python offers a myriad of libraries for creating visually engaging and informative visualizations. Among these, one library stands out for its versatility and effectiveness: plotnine. Let’s delve deeper into the capabilities and features of plotnine in this post.
Introduction to Plotnine:
plotnine
is a data visualization library based on grammar of graphics, which provides a consistent and intuitive approach to constructing visualizations. Inspired by the famous R package ggplot2
, plotnine offers a high-level interface for creating complex plots with minimal code.
Getting Started:
You will first need to install the library. You can do so via pip: pip install plotnine. Once installed, you can import plotnine and other required libraries:
Importing the data:
Here’s a snapshot of the data we will be working with today:
= pd.read_csv("customer_shopping_data.csv")
df 3) df.head(
invoice_no | age | category | price | invoice_date | |
---|---|---|---|---|---|
0 | I138884 | 28 | Clothing | 1500.40 | 05-08-2022 |
1 | I227836 | 28 | Clothing | 1500.40 | 24-05-2022 |
2 | I121056 | 49 | Cosmetics | 40.66 | 13-03-2022 |
We will be understanding the syntax for Bar chart and Line graph in this blog, but you can go on ahead later and try out other plots like Box plot, Scatter plot, and so on.
1. Bar Chart
Let’s try to make a bar chart depicting total sales in various categories for the year 2022.
= df.groupby('category').agg({'price': lambda x: round(x.sum(), 2)}).reset_index() # aggregating sales by category
category_sales 3) category_sales.head(
category | price | |
---|---|---|
0 | Books | 106822.65 |
1 | Clothing | 14070451.12 |
2 | Cosmetics | 855405.08 |
= (
bar_plot ='category', y='price')) + # specifies the data and the variables to be plotted to both the axes
ggplot(category_sales, aes(x='identity', fill='skyblue') + # geom_bar() adds a bar layer to the plot
geom_bar(stat='price'), size=8, color='black', va='bottom', ha='center') + # add text to show values and its alignment
geom_text(aes(label='Total Sales in Each Category', x='Category', y='Total Sales') + # plot title and axes titles
labs(title=element_text(angle=45, hjust=1)) # arrange text on X-axis so that it does not overlap
theme(axis_text_x
)
bar_plot
2. Line Graph
Now, let’s make a line graph showing month-wise sales for the year 2022.
'invoice_date'] = pd.to_datetime(df['invoice_date'], format='%d-%m-%Y') # converting invoice_date to datetime object
df['month'] = df['invoice_date'].dt.month # extracting month from invoice_date
df[
= df.groupby(['month'])['price'].sum().reset_index() # aggregating sales by month
monthly_sales 3) monthly_sales.head(
month | price | |
---|---|---|
0 | 1 | 2656149.96 |
1 | 2 | 2318201.08 |
2 | 3 | 2705190.76 |
= (
line_graph ='month', y='price')) +
ggplot(monthly_sales, aes(x='blue') + # geom_line() adds a line layer to the plot
geom_line(color='Sales Over Months in 2022', x='Month', y='Sales') +
labs(title# mapping month numbers to month names
=list(range(1, 13)), labels=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
scale_x_continuous(breaks
)
line_graph
A few features of plotnine that sets it apart from other visualization libraries are:
- Grammar of graphics: Provides structured approach by breaking plots down into components like data, aesthetics and geometric objects, thereby making it easier to understand and customize plots.
- Concise syntax: Allows users to create sophisticated plots with minimal code. As its syntax closely resembles that of ggplot2, it makes it easier for users familiar with R to transition to Python.
- Integration: Seamlessly integrates with Pandas DataFrame. It can also be combined with other libraries like Seaborn or Matplotlib for additional customization.
Conclusion:
With its elegance and simplicity, plotnine elevates the art of data visualization, enabling users to transform raw data into compelling narratives through captivating visualizations. Whether you’re a data scientist, analyst, or enthusiast, plotnine serves as a valuable tool in your data exploration arsenal. So, why not unleash the power of plotnine in your next data visualization project?