Plotting a population pyramid in Python
Population pyramids are graphical representation of the age-sex structure of a country or an area. They help us observe what structure that particular population is, what structure it was in the past and how is it likely to change in the future.
For one of my recent project, I needed to plot one in Python. Let’s do it here.
First, we need some data. Let’s use UK population estimates for 2020 from the Office of National Statistics.
This is what my input data should look like:
18 rows, one for each age range and one column for male
and one for female
. For a simple pyramid I can use counts. I could also convert the counts to percentages out of a total, but for now I will stick with counts.
In order to have one gender on the left and the other on the right, I need to multiply the counts for one of the genders by -1. I also want to reduce the numbers by 1,000 so that they are more readable, hence I will divide one by -1,000 and the other by 1,000.
uk['male'] = uk['male'] / -1000
uk['female'] = uk['female'] / 1000
I also need to set up the order of the age ranges to use (so that the younger show at the bottom and the older at the top), this will then be given as one of the arguments to my plotting function.
ages = ['85_', '80_84', '75_79', '70_74', '65_69', '60_64', '55_59', '50_54', '45_49', '40_44', '35_39', '30_34', '25_29', '20_24', '15_19', '10_14', '5_9', '0_4']
Finally, plot the graph.
ax1 = sns.barplot(x='male', y='age', data=uk, order=ages, palette="Blues")
ax2 = sns.barplot(x='female', y='age', data=uk, order=ages, palette="Greens")plt.title("Population pyramid for the UK, 2020 estimates")
plt.xlabel("Male/Female")
plt.grid()plt.xticks(ticks=[-2000, -1000, 0, 1000, 2000],
labels=['2,000k', '1,000k', '0', '1,000k', '2,000k'])
The plot requires two barplot objects, one for male
and one for female
. As male
counts have been converted to negative, they show on the left side on the graph. In order for the x axis label to show positive values, I needed to overide the xticks
to the values I wanted.
That’s it. I now have a simple population pyramid.
Thanks for reading. You can find the code for this as well as the dataset used on my GitHub.