Skip to content

Data Visualisation

Learning Objectives

After this unit, students should be able to

  • reason the need of data visualisation.
  • remember the the conventions of ethical visualisation.
  • choose appropriate visualisation based on the kind of data and analytics question.

Anscombe's Quartet provides the best motivational example to accentuate the need of data visualisation. It is a collection of four toy datasets with \(11\) points each. Just looking at the datasets in their raw form will make us think that they are significantly different. Descriptive statistics for each datasets are as follows:

  • \(\mu_X = 9.0\), \(\mu_Y = 7.5\).
  • \(\rho_{XY} = 0.82\) for each of the dataset.
  • Equation of the regression line to predict \(Y\) using \(X\) as the predictor: \(Y = 3 + 0.5X\).

If we do a simple scatter plot of the datasets, we see a totally different picture. We observe the datasets to be significantly different than each other.

Anscombe

It is often said that a picture is worth a thousand words. But is it always true? Consider the following examples:

There are certain rules of thumb that we can keep into mind while we perform any sort of data visualisation.

Tables versus Plots

  • Tables should be preferred to plots if the values need to be precise.
  • Tables should be preferred to plots if the values have varied units of measurements.
  • Plots should be preferred to tables if the intention is to observe the trend in the data rather than the actual values.

Dimensions of a Plot

A two dimensional plot can represent up to five dimensions in the data with the use of colours, size and shape. The plot can be further animated to include the temporal dimesion. An example is shown in the following figure. Conventionally, good visualisations do not use more than three dimensions in the plot. Use of more than three dimensions put higher cognitive load on the reader and makes the interpretation difficult.

dims
Story Telling with Data

Direct Representation

A chart should not rely on the reader to interpret or perform mental computations to reach inferences. A good visualisation facilitates the story in a single snapshot. For instance, if the intention is to convey the profit ratio over the years, which of the following is a better plot?

proft_rations

Avoiding 3D Plots

Three dimensional static plots shown on the screen or printed on the paper employ their two dimensional projections. The projection tends to distort the dimensions and provides false information to the readers. A typical example is presented in the following figure wherein a three dimensional pie chart clearely distorts the proportion of Item A and Item C.

3d_pie

Right Use of Colours

Colours must be wisely chosen for the visualisations to accentuate the convey the analyses. They should not only hinder with human intuition but also hinder the readability in general. When the colours are overlayed on top of one another, extra care has to be taken about the mutual contrast. Following example shows the impact of contrast on the readbility.

contrast

You may refer to Charlie Custer's post to see a great commentary on the use of colours.

Do not lie!

This is the most important rule of thumb ethical data visualisation. Data visualisation should in no way misguide the readers about the analysis inteded to be conveyed by it. Consider the following survey published in a newspapre. Does the length of the bars in the plot quantify the amount of money or fraction of people who spend the money in the category? It does neither and creates a false impression on a layperson that most of the people spend a large money on the significant other (not that I am against it!).

survey