Visualising Water Consumption using a Geographic Bubble Chart

A geographic bubble chart is a straightforward method to visualise quantitative information with a geospatial relationship. Last week I was in Vietnam helping the Phú Thọ Water Supply Joint Stock Company with their data science. They asked me to create a map of a sample of their water consumption data. In this post, I share this little ditty to explain how to plot a bubble chart over a map using the ggmap package.

In this post, I share this little ditty to explain how to plot a bubble chart over a map using the ggmap package. You can find the code and data for this article on my GitHub repository. With thanks to Ms Quy and Mr Tuyen of Phu Tho water for their permission to use this data. Other articles on this blog detail how to analyse water consumption from digital metering data.

The sample data contains a list of just over 100 readings from water meters in the city of Việt Trì in Vietnam, plus their geospatial location. This data uses the World Geodetic System of 1984 (WGS84), which is compatible with Google Maps and similar systems.

```# Load the data

# Summarise the data
summary(water\$Consumption)
```

The consumption at each connection is between 0 and 529 cubic metres, with a mean consumption of 23.45 cubic metres.

Visualise the data with a geographic bubble chart

With the ggmap extension of the ggplot package, we can visualise any spatial data set on a map. The only condition is that the spatial coordinates are in the WGS84 datum. The ggmap package adds a geographical layer to ggplot by adding a Google Maps or Open Street Map canvas. The first step is to download the map canvas. To do this, you need to know the centre coordinates and the zoom factor. To determine the perfect zoon factor requires some trial and error. The ggmap package provides for various map types, which are described in detail in the documentation.

```library(ggmap)
# Find the centre of the points
centre <- c(mean(range(water\$lon)), mean(range(water\$lat)))
viettri <- get_map(centre, zoom = 17, maptype = "hybrid")
g <- ggmap(viettri)
```

The ggmap package follows the same conventions as ggplot. We first call the map layer and then add any required geom. The point geom creates a nice bubble chart when used in combination with the `scale_size_area option`. This option scales the points to a maximum size so that they are easily visible. The transparency (alpha) minimises problems with overplotting. This last code snippet plots the map with water consumption.

```# Add the points
g + geom_point(data = reads, aes(x = lon, y = lat, size = Consumption),
shape = 21, colour = "dodgerblue4", fill = "dodgerblue", alpha = .5) +
scale_size_area(max_size = 20) +
# Size of the biggest point
ggtitle("Việt Trì sự tiêu thụ nước")
```

This map visualises water consumption in the targeted area of Việt Trì. The larger the bubble, the larger the consumption. It is no surprise that two commercial customers used the most water. Ggplot automatically adds the legend for the consumption variable.

6 thoughts on “Visualising Water Consumption using a Geographic Bubble Chart”

1. Hi peter, your post isnt coming thru in its entirety on r-bloggers. You might want to set your rss feed to complete articles? Kind regards,Roel

1. Hi Roel, should be fixed now. Thanks for notifying.

2. Nice effort! Consider reducing the opacity (not sure if u can do that w/ ggmap. If you can’t use leaflet) in order to better see overlap. In the case of the multiple small bubbles it’s difficult to know whether there’s 3 circles overlapped or 30. Which obviously makes a big difference 🙂 Another way to visualize could be to overlap a heatmap where values were aggregated into grids to show relative “hotness”.

1. The `alpha` variable in the point geom controls the opacity of the circles.

To create a heatmap you need to convert the water meter readings to counts and the use the stat_bin_2d geom.

```water_count &amp;lt;- data.frame(lat = rep(water[,&amp;quot;lat&amp;quot;], water\$Consumption),
lon = rep(water[,&amp;quot;lon&amp;quot;], water\$Consumption))

g + stat_bin_2d(data = water_count, aes(x = lon, y = lat),
size = .5, bins = 30, alpha = 1/2) +