Boxplot
Description
The boxplot (also known as a box and whisker plot) uses boxes and lines to show the distributions of one or more groups of numeric data based on a 5-point summary of data points: the upper extreme (“maximum”), upper quartile (Q3), median, lower quartile(Q1), and lower extreme (minimum) values. Through these five values, the boxplot provides information regarding the variability and skewness of the distribution. Box limits indicate the range of the central 50% of the data, with a central line marking the median value. Lines extend from each box to capture the range of the remaining data, with dots placed past the line edges to indicate outliers.
When to use
Boxplots are used to show distributions of numeric data values, especially when you want to compare them between multiple groups. They are built to provide high-level information at a glance, offering general information about a dataset’s symmetry, skew, variance, and outliers. They allow for a comparison of data from different categories for easier, more effective decision-making.
Dos and donts
Order the groups when they don’t have inherent order (for example, sorting groups by median).
Use horizontal boxplot when there are a lot of groups to plot, or if the group names are long.
Avoid using boxplots if you want to show the distribution in only one group. A histogram is recommended in this case.
Tools available
MS Office Power BI RHistogram
Description
A histogram displays the distribution of data over a continuous interval or specific time period. The height of each bar in a histogram indicates the frequency of data points within the interval/bin. It’s a great tool to identify where values are concentrated, or if there are extreme values or gaps in the dataset.
When to use
Histograms are good for showing the general distribution of dataset variables. You can see roughly where the peaks of the distribution are, whether the distribution is skewed or symmetric and if there are any outliers.
Dos and donts
Always start at a zero baseline.
Use an appropriate number of bins and interpretable bin boundaries.
Don’t use unequal bin sizes.
Don’t use histograms for non-continuous data (use bar/column chart instead).
Tools available
MS Office Power BI Illustrator D3.js Matplotlib RPopulation pyramid
Description
A population pyramid consists of two histograms, one for each gender (conventionally, males on the left and females on the right) where the population numbers are shown horizontally (X-axis) and the age vertically (Y-axis). The values can be displayed either as a percentage of the total population or as a raw number.
When to use
Population pyramids are the most effective visualization to analyze changes or differences in population groups. From the population pyramid, information about the population broken down by age and sex can be identified, which can also lead to other aspects of the population.
Dos and donts
Use consistent colour to represent the same gender.
Sort value in descending order by age group.