Exploring Python’s Grammar of Graphics with Plotnine

Data analysis and exploration rely heavily on visualization to uncover insights, patterns, and trends within raw data. Python offers a myriad of libraries for creating visually engaging and informative visualizations. Among these, one library stands out for its versatility and effectiveness: plotnine. Let’s delve deeper into the capabilities and features of plotnine in this post.

Introduction to Plotnine:

plotnine is a data visualization library based on grammar of graphics, which provides a consistent and intuitive approach to constructing visualizations. Inspired by the famous R package ggplot2, plotnine offers a high-level interface for creating complex plots with minimal code.

Getting Started:

You will first need to install the library. You can do so via pip: pip install plotnine. Once installed, you can import plotnine and other required libraries:

from plotnine import *
import pandas as pd

Importing the data:

Here’s a snapshot of the data we will be working with today:

df = pd.read_csv("customer_shopping_data.csv")
df.head(3)
invoice_no age category price invoice_date
0 I138884 28 Clothing 1500.40 05-08-2022
1 I227836 28 Clothing 1500.40 24-05-2022
2 I121056 49 Cosmetics 40.66 13-03-2022

We will be understanding the syntax for Bar chart and Line graph in this blog, but you can go on ahead later and try out other plots like Box plot, Scatter plot, and so on.

1. Bar Chart

Let’s try to make a bar chart depicting total sales in various categories for the year 2022.

category_sales = df.groupby('category').agg({'price': lambda x: round(x.sum(), 2)}).reset_index() # aggregating sales by category
category_sales.head(3)
category price
0 Books 106822.65
1 Clothing 14070451.12
2 Cosmetics 855405.08
bar_plot = (
    ggplot(category_sales, aes(x='category', y='price')) + # specifies the data and the variables to be plotted to both the axes
    geom_bar(stat='identity', fill='skyblue') + # geom_bar() adds a bar layer to the plot
    geom_text(aes(label='price'), size=8, color='black', va='bottom', ha='center') + # add text to show values and its alignment
    labs(title='Total Sales in Each Category', x='Category', y='Total Sales') + # plot title and axes titles
    theme(axis_text_x=element_text(angle=45, hjust=1)) # arrange text on X-axis so that it does not overlap
)

bar_plot

2. Line Graph

Now, let’s make a line graph showing month-wise sales for the year 2022.

df['invoice_date'] = pd.to_datetime(df['invoice_date'], format='%d-%m-%Y') # converting invoice_date to datetime object
df['month'] = df['invoice_date'].dt.month # extracting month from invoice_date

monthly_sales = df.groupby(['month'])['price'].sum().reset_index() # aggregating sales by month
monthly_sales.head(3)
month price
0 1 2656149.96
1 2 2318201.08
2 3 2705190.76
line_graph = (
    ggplot(monthly_sales, aes(x='month', y='price')) +
    geom_line(color='blue') + # geom_line() adds a line layer to the plot
    labs(title='Sales Over Months in 2022', x='Month', y='Sales') +
    # mapping month numbers to month names 
    scale_x_continuous(breaks=list(range(1, 13)), labels=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
)

line_graph

A few features of plotnine that sets it apart from other visualization libraries are:

  • Grammar of graphics: Provides structured approach by breaking plots down into components like data, aesthetics and geometric objects, thereby making it easier to understand and customize plots.
  • Concise syntax: Allows users to create sophisticated plots with minimal code. As its syntax closely resembles that of ggplot2, it makes it easier for users familiar with R to transition to Python.
  • Integration: Seamlessly integrates with Pandas DataFrame. It can also be combined with other libraries like Seaborn or Matplotlib for additional customization.

Conclusion:

With its elegance and simplicity, plotnine elevates the art of data visualization, enabling users to transform raw data into compelling narratives through captivating visualizations. Whether you’re a data scientist, analyst, or enthusiast, plotnine serves as a valuable tool in your data exploration arsenal. So, why not unleash the power of plotnine in your next data visualization project?