Gantt Chart in Python

What is gantt chart?

A Gantt chart is a type of bar chart that illustrates the duration and timing of various tasks or events. Each task is represented as a horizontal bar, with the length and position of the bar reflecting the start date, end date, and duration. In data science, it is particularly useful for visualizing and managing the scheduling of activities, analyzing overlaps, and assessing the progress of different tasks within a given timeframe.

Importing libraries

Libraries Description
pandas Data manipulation and analysis
matplotlib.pyplot Plotting graphs and charts
matplotlib.dates Date handling and formatting for plots
plotly.figure_factory Creating interactive charts and visualizations

Importing Data

The dataset provides a one-year timeline of various marketing campaigns, detailing the start and end dates along with the duration of each campaign in weeks.

Media_Channel Start Finish Duration (weeks)
0 Branded Search 2017-01-15 2017-03-15 8
1 Non-branded Search 2017-05-01 2017-08-20 15
2 Facebook 2017-02-01 2017-11-15 41
3 Print 2017-09-01 2017-09-30 4
4 Out-of-Home (OOH) 2017-04-10 2017-06-25 10
5 TV 2017-07-01 2017-09-01 8
6 Radio 2017-10-01 2017-12-15 10

Gantt chart using Matplotlib

Using Matplotlib, the Gantt chart visually represents each marketing campaign as horizontal bars, showing their start and end dates on the x-axis. It highlights the duration and overlap of campaigns, offering a clear overview of their distribution.

# Plotting the Gantt Chart with reduced figure size
fig, ax = plt.subplots(figsize=(8, 5))  # Reduced figure size
# Define colors for each task (optional)
colors = ['tab:blue', 'tab:orange', 'tab:green', 'tab:red', 'tab:purple', 'tab:brown', 'tab:pink']
# Iterate over tasks to plot them
for i, task in enumerate(df.itertuples()):
    start = task.Start
    finish = task.Finish
    ax.barh(task.Task, (finish - start).days, left=start, color=colors[i % len(colors)])
# Format the x-axis for dates
ax.xaxis_date()
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
# Add labels and title
ax.set_xlabel('Date')
ax.set_ylabel('Campaigns')
ax.set_title('Marketing Campaign Gantt Chart using Matplotlib')
# Rotate date labels
plt.xticks(rotation=90)
# Show plot
plt.tight_layout()
plt.show()

Gantt chart using Plotly

  • The create_gantt function from Plotly’s figure_factory module is used to create a Gantt chart.
    Here’s what it does: It takes a list of dictionaries (or DataFrame) where each dictionary represents a task with attributes such as Task (name of the task), Start (start date), and Finish (end date).
  • Using Plotly, we have plotted the Gantt chart to display each marketing campaign as horizontal bars representing their start and end dates. Additionally, when using create_gantt, hovering over the bars provides detailed information, including the start and end dates for each campaign.
df_dict = df.to_dict(orient='records')
# Create the Gantt chart using Plotly's figure_factory
fig = ff.create_gantt(df_dict,show_colorbar=True,index_col='Task',title="Gantt Chart of Marketing Campaigns using Plotly",showgrid_x=True,showgrid_y=True)
# Update layout to center the title and adjust other settings
fig.update_layout(
    xaxis_title='Date',
    yaxis_title='Marketing Channels',
    xaxis=dict(
        tickformat="%m-%d-%Y",
        tickangle=90  # Rotate the x-axis labels to 90 degrees
    ),
    yaxis=dict(
        autorange="reversed"  # Reverse the y-axis to have the first channel at the top
    ),
    title=dict(
        text="Gantt Chart of Marketing Campaigns using Plotly",  # Set the title text
        x=0.5,  # Center the title horizontally
        xanchor='center',  # Anchor the title in the center
        pad=dict(t=20)  # Add padding to the top of the title to create space
    ),
    margin=dict(t=100)  # Adjust top margin to ensure title and graph do not overlap
)
# Show the chart
fig.show()

Conclusion

Gantt chart provides a clear visual representation of the timing and duration of various tasks or events. It helps in analyzing and managing the scheduling of activities, identifying overlaps, and assessing the progress of different components over a defined period. This visualization is crucial for understanding the temporal aspects of data-driven processes and optimizing workflows.