This chapter of Data Science for Utilities discusses sharing the results of data analysis using RMarkdown to create a PowerPoint

Sharing the Results of Data Analysis with R Markdown

Peter Prevos

Peter Prevos |

653 words | 4 minutes

Share this content

Data science aims to create value from data by creating useful, sound and aesthetic data products. Analysing data is rewarding, but creating value requires you to communicate the results. Analysing data in RStudio is fun but hard to share with anyone who does not understand the language. Of course, you could copy and paste results into a document, but that is neither efficient nor reproducible. This chapter explains how to communicate the fruits of your labour with colleagues and other interested parties by generating reports that combine text and analysis through literate programming. This chapter of Data Science for Water Utilities a workflow for data science products and how to create data products using the R Markdown method. The learning objectives for this chapter are:

  • Implement the workflow for data science projects
  • Apply the principles of reproducible and replicable research
  • Use basic R Markdown to create a Powerpoint presentation from data

Data Science for Water Utilities

Data Science for Water Utilities

Data Science for Water Utilities published by CRC Press is an applied, practical guide that shows water professionals how to use data science to solve urban water management problems using the R language for statistical computing.

The data and code used in this chapter are available on GitHub:

Data Science Workflow

The data science workflow starts with preparation and data cleaning, which can take most of the time in some projects. All projects should begin with a concise problem statement to prevent open-ended data dredging. Programmatic data cleaning is the topic of the next chapter.

Once the data has been tidied, we can try to understand the data through exploratory analysis using descriptive statistics and visualisation. That insight allows us to model and reflect, an iterative process known as the 'data vortex'.

Once we have created the model that best answers the problem statement, we can communicate these results to the consumers of our data products. One method is through literate programming with tools such as R Markdown or Org Mode.

Data science workflow
Data science workflow.

Literate Programming Reproducible Analysis

Literate programming means integrating the text, the data and the code. This approach ensures that the analysis is reproducible, which means you can repeat the same process with new data in the same way.

A data science project consists of at least three sets of code:

  1. Cleaning code to get from raw data to a tidy state
  2. Analytical code to analyse the data
  3. Presentation code to create summaries, tables and figures.
Reproducible research pipeline
Reproducible research pipeline.

With literate programming, you can combine text and code in one script. The most popular method with R is to use a Markdown file. This is the WYSIWYM (What You See is What You Mean) approach instead of the more popular WYSIWYG (What You See Is What You Get) approach in word processors.

What You See Is What You Mean (WYSIWYM)
What You See Is What You Mean (WYSIWYM)

The left part of the screen shows what it looks like when I write a book, and the right part shows the end result when exported as a PDF file.

The screencast below introduces the principles of this approach.

Sharing the Results of Data Analysis Screencast

Chapter six of Data Science for Water Utilities explains using literate programming with R Markdown in more detail. This screencast below reviews the code for this chapter.

Sharing the Results of Data Analysis with RMarkdown.

The data and code used in this chapter are available on GitHub:

Additional Resources

The consumer-involvement.Rmd in the case-studies folder of the GitHub repository shows how to write a journal article, including references using a BibTeX file. The concept of Consumer Involvement is explained in Chapter 8.

The R Markdown: The Definitive Guide by Yihui Xie and others explains the how to use literate programming in R in great detail.

Other Chapters

Previous Chapter: Visualising Data with ggplot2

Next Chapter: Managing Dirty Data

Feel free to contact me if you have any comments, suggestions or questions about this book.

Share this content

You might also enjoy reading these articles

Analysing the Customer Experience

Basic Linear Regression

Basics of the R Language