This chapter of Data Science for Utilities course teaches visualising data with ggplot2 using a water quality case study

Visualising Data with ggplot2

Peter Prevos

Peter Prevos |

615 words | 3 minutes

Share this content

Visualisation is essential for data scientists who want to explore and understand data and communicate their results. Scientists have studied how the mind perceives graphics in great depth and devised a comprehensive body of knowledge that helps us create sound, useful and aesthetic visualisations. The first part of this chapter of Data Science for Water Utilities introduces some principles of best practices in data visualisation. The second part presents the ggplot2 library and the principles of the Grammar of Graphics to create basic visualisations. The learning objectives for this chapter are:

  • Evaluate data visualisations using basic principles.
  • Apply the principles of the Grammar of Graphics.
  • Visualise water quality data with the ggplot2 package.

Data Science for Water Utilities

Data Science for Water Utilities

Data Science for Water Utilities published by CRC Press is an applied, practical guide that shows water professionals how to use data science to solve urban water management problems using the R language for statistical computing.

The data and code used in this chapter are available on GitHub:

Principles of Visualisation

Data visualisation needs to tell a story, which means that when somebody looks at the image, they need to draw a conclusion. A visualisation should not be a case of "create your own adventure". Achieving this objective requires simplicity and graphic austerity.

The data-to-pixel ratio is a rule of thumb that helps to focus your visualisation on the story. This ratio expresses the number of pixels in the image (excluding the neutral background) versus those representing data.

The example below illustrates the idea. The image on the left might be interesting, but the background is distracting, and the colours don't express data. The version on the right is a lot clearer on the story. We can immediately see which imaginary suburb has the best taste scores for tap water.

Data to Pixel Ratio low and high
Data to Pixel Ratio.

Grammar of Graphics and ggplot2

The ggplot2 package is part of the Tidyverse. It implements the Grammar of Graphics, a systematic approach to visualising data using layers. The various functions in ggplot2 are chained with a + sign, forming the individual layers, as shown below. You can combine as many layers as you like.

Grammar of Graphics and ggplot2 layers
Grammar of Graphics and ggplot2 layers.

Using Colour in Visualisations

Most graphs overuse colour to visualise data. Colours should only be used to comply with a style guide or to visualise data.

The Color Brewer website provides a great interface to explore various colour palates.

Since writing this book, the latest R version has received an upgraded colour palette. This article explains the new built-in colour palettes.

Emil Hvitfeld has gathered a large collection of colour palettes in R.

Visualising Data with ggplot2 Screencast

Chapter five of Data Science for Water Utilities explains visualising data with ggplot2 in more detail. This screencast below reviews the code for this chapter.

To help you remember the various functions discussed in the first five chapters of the book, a cheat sheet is available.

Visualising Water Quality Data with ggplot2.

The data and code used in this chapter are available on GitHub:

Additional Resources

To help you remember the various functions discussed in the first five chapters of the book, a cheat sheet is available.

Addendum

The width parameter in ggplot2 geometries that use lines has been deprecated and is replaced with linewidth.

 

Other Chapters

Previous Chapter: Descriptive Statistics

Next Chapter: Sharing the Results

Feel free to contact me if you have any comments, suggestions or questions about this book.

Share this content

You might also enjoy reading these articles

Analysing the Customer Experience

Basic Linear Regression

Basics of the R Language