# Chart Gallery - Correlation

## Bubble chart

#### Description

A bubble chart displays multi-dimensional data in a two-dimensional plot. It can be considered as a variation of the scatterplot, in which the dots are replaced with bubbles. However, unlike a scatterplot which has only two variables defined by the X and Y axis, on a bubble chart each data point (bubble) can be assigned with a third variable (by size of bubble) and a fourth variable (by colour of bubble). Furthermore, a fifth variable (usually, time) can also be assigned by animating the data variables changing over time.

#### When to use

A bubble chart is primarily used to compare and show the relationships between numeric variables, by the use of positioning and proportions. It allows the analysis of patterns or correlations of a given data set.

#### Dos and donts

Try to limit the number of points to plot.

Draw bubbles with transparency.

Don’t draw the size of the circle based on its radius or diameter; use the circle’s area instead.

• Use legend and/or interactivity (by clicking or hovering over bubble) to improve readability and understanding of the chart.

## Heatmap

#### Description

A heatmap is a type of visualization that values are depicted through variations in colour within a two-dimensional matrix of cells. It allows us to visualize complex data and understand it at a glance.

#### When to use

Heatmaps are useful for visualizing variance across multiple variables to show patterns in correlations.

#### Dos and donts

Use an appropriate colour palette to match the data: use a sequential palette to differentiate high values from low values; use a diverging palette when a reference value is in the middle of the data range such as zero.

• Include a legend to support the understanding of the chart.
• Include values in cells for static versions, when the fine differences between variables are important.

## Scatterplot

#### Description

A scatterplot is a type of visualization using Cartesian Coordinates to display two variables for a set of data. The data are displayed as a collection of dots. The position of each dot on the horizontal and vertical axis indicates the values for an individual data point.

#### When to use

Scatterplots are commonly used to observe the relationship or correlation between two numeric variables. The relationships between variables can be described in many ways: positive or negative, strong or weak, linear or nonlinear. In a scatterplot, we can not only see the values through individual data points but also patterns when the data are taken as a whole.

Typically, a scatterplot displays only two numeric variables, however additional variables can also be displayed by assigning values through colour (categorical variable), size (numeric variable), and animation (time series variable).

#### Dos and donts

Avoid overplotting (when lots of data points overlap each other, relationships between variables will not be easy to identify).

If overplotting can’t be avoided, use transparency to help the reading.

• Don’t interpret correlation as causation.
• To help the analysis you could add a regression line.

## Connected scatterplot

#### Description

A connected scatterplot is a type of visualization that displays the evolution of a series of data points that are connected by straight line segments. In some cases, it is not the most intuitive to read; but it is impressive for storytelling.

#### When to use

A connected scatterplot is often used to show a trend in data and the relationship between two variables over intervals of time. It has similar functionality as the line chart, except data points of connected scatterplot are highlighted with markers (dots). Spend more time on the design (a well-designed connected scatterplot can be a powerful tool for storytelling).

#### Dos and donts

Use an arrow line to indicate the direction of evolution when a third ordered variable (very often time) is assigned.

Use an appropriate measurement interval

Don’t plot too many data series.

## Tree diagram

#### Description

A tree diagram is a graphical representation that displays hierarchical data in a tree-like structure. The diagram has connecting lines extending from a root node, a member that has no superior/parent. Then the connecting lines (also called “branches”) link other nodes together to show the relationship between the members. Finally, there are the leaf nodes (also called “end-nodes”) being the members with no further extensions.

#### When to use

Tree diagrams are often used to display organizational hierarchy such as organization charts which give clear information on reporting lines and structure. They can also be used to show family relations and descent (known as a “family tree”).

#### Dos and donts

Use simple shapes such as rectangles or circles as nodes.

Use consistent colours throughout the chart.

Build the diagram from the root node from top to bottom or from left to right.

Each node can only have one connection in the hierarchy to the top (except the root node).

## Venn diagram

#### Description

A Venn diagram (or set diagram) is a type of diagram that displays overlapping shapes (typically circles) to illustrate all possible logical relationships between two or more sets of items. The area of overlap is known as the area of intersection. This is where items are in common between the sets.

#### When to use

Venn diagrams are widely used in mathematics, statistics, computer science, and business to show relationships between multiple sets. It is an effective way to organize things graphically, and highlight how the items are similar and different.

#### Dos and donts

Use contrasting colours for different sets.

Use transparency to allow clear distinction of the intersections.

Make sure text labels are clear and visible

Don’t display too many sets in a single diagram.