This tutorial is the first in this series that covers visualising data in Python. We start with bar charts because they are common and easy to work with.
Bar Charts
The bar chart is a visualisation where the height or length of each bar represents a value. It is a simple chart most of us are familiar with. This combination of simplicity and familiarity is the bar chart's superpower. It makes it one of the most effective charts as most people can easily read and understand it.
Bar charts are great for showing comparisons across categories. For example, we can show the number of students enrolled for the different classes in a faculty. We can also use them to show and compare the changes in values over time, like monthly sales in a year. Bar charts can show a single series of data or more than one series. A bar chart that shows more than one series is called a grouped bar chart.
Formatting Tips and Best Practices
While the simplicity of the bar chart is its superpower, it can also be its downfall if not formatted well. Here, we look at some things to consider when creating and formatting bar charts.
Baseline
The baseline of the chart should be zero. Let's look at the two charts below.
The first chart with a non-zero baseline makes it seem as if the gaps are wider than they are. Using the same data, the second chart shows that the gap is smaller. Altering the baseline to a different value may lead the audience to make inaccurate conclusions based on the chart.
Bars
Avoid using 3-D shapes as they make it harder to map the length to the value on the axis. Aim for consistency and readability by keeping all bars the same width. The width of the bars should be bigger than the whitespace between the bars.
Using pictures instead of bars
Substituting the bars for pictures may be a great way to make the chart engaging. But, a common mistake people make is not considering the scale relevant to the data. Let's look at the visualisation below.
The second picture is longer than the first. While this may be correct, it is also wider. This inflates the value, making it appear larger than it is. This can be misleading.
Sorting the bars
Sorting the bars can make the chart easier to read and understand. This reduces the cognitive load of the audience and allows them to easily make their comparisons. Only do this if the data has no inherent order and it is not time-based.
Grouped bar charts
Limit the number of series in your chart. Too many may overwhelm the audience and reduce the readability of the chart. Likewise, not choosing colours well may have the same effect. Choose harmonious colours that blend well together. Ensure that the order in which the bars appear is consistent for each group. Whitespace is essential for grouped bar charts as it distinguishes the groups. Make sure you're using the right amount of whitespace.
Creating bar charts in Python using matplotlib
We will create our bar charts using matplotlib, pandas and numpy. We will go through how to create the following:
vertical bar chart (column chart).
horizontal bar chart.
grouped bar charts.
stacked bar charts.
Let's get into the code.
We start by importing our modules. Next, we create a dictionary named units to convert into a dataframe.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
units = {
'Items': ['Sweaters', 'T-Shirts', 'Jeans', 'Sneakers'],
'Q1': [40, 44, 32, 35],
'Q2': [52, 30, 45, 41],
'Q3': [60, 35, 20, 44],
'Q4': [19, 69, 41, 28]
}
df = pd.DataFrame(units)
Running the code creates a dataframe like the table below. This data shows the number of units sold per quarter for each item in 2022.
Items | Q1 | Q2 | Q3 | Q4 |
Sweaters | 40 | 52 | 60 | 19 |
T-Shirts | 44 | 30 | 35 | 69 |
Jeans | 32 | 45 | 20 | 41 |
Sneakers | 35 | 41 | 44 | 28 |
Let's get started now that we have what we need. First on our list is a vertical bar chart. This is the most common bar chart and the easiest to create.
Vertical Bar Chart
We use the function plt.bar
to create a vertical bar chart. This function takes at least two arguments, the x and the height or y values. You can specify a colour of your choosing (or different colours for each bar) as the third argument.
Let's create a bar chart for the number of units sold in the first quarter. First, we create a second dataframe that has been sorted based on the Q1 column. The x-axis will display the items and our y-axis will display the number of units sold for each item. We will also add x and y labels, and give our chart a title.
df_sorted = df.sort_values('Q1')
plt.bar(df_sorted['Items'], df_sorted['Q1'], color='#29AB87')
plt.xlabel('Items')
plt.ylabel('Number of Items Sold')
plt.title('Number of Items Sold in Q1')
Running the code produces the visual below.
And that's it for our vertical bar chart. Let's represent the data with a horizontal bar chart.
Horizontal Bar chart
Horizontal bar charts are excellent to use when the names of your categories are lengthy. This makes them simple to read. To create a horizontal bar chart, we use plt.barh
. Using the same data as above, plotting the bar chart would look like this:
plt.barh(df_sorted['Items'], df_sorted['Q1'], color='#29AB87')
plt.xlabel('Items')
plt.ylabel('Number of Items Sold')
plt.title('Number of Items Sold in Q1')
Running our code gives us the chart below.
Next up is the grouped bar chart.
Grouped Bar Chart
Creating a grouped bar chart requires extra effort. There are a few ways to go about it. All these methods rely on the same underlying logic. The chart needs to display multiple series next to each other. Also, we need to group the series into their relevant categories.
For our data, we need to display units sold for each item in a quarter, grouped into different quarters. To achieve this, we have to get creative with the x-axis. First, we create a numpy array containing our x-tick values. We have four categories, so our array will have the numbers 0, 1, 2 and 3. Next, we define a width for the bars. Defining a width for the bars gives us a value to use to adjust where we plot the bars on the x-axis.
x = np.arange(len(4))
w = 0.2
Now we subtract from and add to the x-tick values to position our bars. Because we have an even number of bars to plot, the x-tick needs to be at the halfway point between two bars, and not in the centre of a bar as it usually does. We ensure that we keep the width as the difference between where we plot the different bars.
plt.bar(x-0.3, df['Q1'], width=w, color='#274c77')
plt.bar(x-0.1, df['Q2'], width=w, color='#29AB87')
plt.bar(x+0.1, df['Q3'], width=w, color='green')
plt.bar(x+0.3, df['Q4'], width=w, color='#8b8c89')
This code creates our base grouped bar chart. Let's add some formatting to it to make it readable. We create two lists: one with our category names and the other with our items. We use the first list to label the x-ticks and the second list to create a legend for the chart. Finally, we add the x and y labels and a title, and we're done. Here, we did not sort the data because it is time-based.
quarters = ['Q1', 'Q2', 'Q3', 'Q4']
legend = list(df['Items'])
plt.xticks(x, quarters)
plt.xlabel('Quarters 2022')
plt.ylabel("Units Sold")
plt.title('Number of Units Sold per Quarter in 2022')
plt.legend(legend)
Running the code will give us our finished product which is the chart below.
To create a grouped horizontal bar chart, one would apply the same logic. Using, we play around with the y-ticks instead of the x-ticks and we adjust the height of the bars instead of the width. We also swap the x and y labels around.
plt.barh(y-0.3, df['Q1'], height=0.2, color='#274c77')
plt.barh(y-0.1, df['Q2'], height=0.2, color='#29AB87')
plt.barh(y+0.1, df['Q3'], height=0.2, color='green')
plt.barh(y+0.3, df['Q4'], height=0.2, color='#8b8c89')
plt.yticks(x, quarters)
plt.xlabel("Units Sold")
plt.ylabel('Quarters 2022')
plt.title('Number of Units Sold per Quarter in 2022')
plt.legend(legend)
Our grouped horizontal bar chart should look like this once we run the code.
Last but not least, we will plot a stacked bar chart.
Stacked Bar Chart
Stacked bar charts are best for comparing totals across categories. They allow us to visualise values as parts of a whole. The entire bar represents the whole, while all the values within represent the parts.
Generating a stacked bar chart is a simple process. We use the parameter bottom=
to stack the bars on top of each other.
plt.bar(quarters, df['Q1'], color='#274c77')
plt.bar(quarters, df['Q2'], bottom=df['Q1'],color='#29AB87')
plt.bar(quarters, df['Q3'], bottom=df['Q1']+df['Q2'],color='green')
plt.bar(quarters, df['Q4'], bottom=df['Q1']+df['Q2']+df['Q3'],color='#8b8c89')
plt.xlabel('Quarters 2022')
plt.ylabel("Units Sold")
plt.title('Number of Units Sold per Quarter in 2022')
plt.legend(legend)
Running the code gives us our stacked bar chart as follows.
To create a horizontal stacked bar chart, one we use the same logic. Using plt.barh
, we use the parameter left=
to stack the bars side by side. We also swap the x and y labels around.
plt.barh(quarters, df['Q1'], color='#274c77')
plt.barh(quarters, df['Q2'], left=df['Q1'],color='#29AB87')
plt.barh(quarters, df['Q3'], left=df['Q1']+df['Q2'],color='green')
plt.barh(quarters, df['Q4'], left=df['Q1']+df['Q2']+df['Q3'],color='#8b8c89')
plt.xlabel("Units Sold")
plt.ylabel('Quarters 2022')
plt.title('Number of Units Sold per Quarter in 2022')
plt.legend(legend)
Running the code gives us the chart below.
And that's a wrap. We have walked through how to create bar charts in Python using matplotlib.
Wrapping Up
In this tutorial, we:
discussed bar charts and their best use cases
explored some formatting tips and best practices
created the different types of bar charts using matplotlib
This tutorial serves as an introduction to visualising data with and creating bar charts in Python. We covered just the basics. If you would like to learn more, I would recommend the following resources:
Matplotlib documentation which can be found here.
storytelling with data by Cole Nassbaumer Knaflic - This is a great book to have for anyone working with data. You should be able to find it at your favourite bookstore or online retailer.
I hope you found this tutorial helpful and informative. Until next time.