The netCDF format is popular in sciences that analyse sequential spatial data. It is a self-describing, machine-independent data format for creating, accessing and sharing array-oriented information. The netCDF format provides spatial time-series such as meteorological or environmental data. This article shows how to visualise and analyse this data format by reviewing soil moisture data published by the Australian Bureau of Statistics. The latest version of this code is available on my GitHub repository.
Soil Moisture data
The Australian Bureau of Meteorology publishes hydrological data in both a simple map grid and in the NetCDF format. The map grid consists of a flat text file that requires a bit of data jujitsu before it can be used. The NetCDF format is much easier to deploy as it provides a three-dimensional matrix of spatial data over time.
We are looking at the possible relationship between sewer main blockages and deep soil moisture levels. You will need to manually download this dataset from the Bureau of Meteorology website. I have not been able to scrape the website automatically. For this analysis, I use the actual deep soil moisture level, aggregated monthly in NetCDF 4 format.
Reading, Extracting and Transforming the netCDF format
The ncdf4 library, developed by David W. Pierce, provides the necessary functionality to manage this data. The first step is to load the data, extract the relevant information and transform the data for visualisation and analysis. When the data is read, it essentially forms a complex list that contains the metadata and the measurements.
ncvar_get function extracts the data from the list. The lon, lat and dates variables are the dimensions of the moisture data. The time data is stored as the number of days since 1 January 1900. The spatial coordinates are stored in decimal degrees with 0.05-decimal degree intervals. The moisture data is a three-dimensional matrix with longitue, latitude and time as dimensions. Storing this data in this way will make it very easy to use.
library(ncdf4) # Load data bom &lt;- nc_open(&quot;Hydroinformatics/SoilMoisture/sd_pct_Actual_month.nc&quot;) print(bom) # Inspect the data # Extract data lon &lt;- ncvar_get(bom, &quot;longitude&quot;) lat &lt;- ncvar_get(bom, &quot;latitude&quot;) t &lt;- as.Date(&quot;1900-01-01&quot;) + ncvar_get(bom, &quot;time&quot;) moisture &lt;- ncvar_get(bom, &quot;sd_pct&quot;) dimnames(moisture) &lt;- list(lon, lat, t)
Visualising the data
The first step is to check the overall data. This first code snippet extracts a matrix from the cube for 31 July 2017 and plots it. This code pipe extracts the date for the end of July 2017 and creates a data frame which is passed to ggplot for visualisation. Although I use the Tidyverse, I still need reshape2 because the gather function does not like matrices.
library(tidyverse) library(RColorBrewer) library(reshape2) d &lt;- &quot;2017-07-31&quot; m &lt;- moisture[, , which(t == d)] %&gt;% melt(varnames = c(&quot;lon&quot;, &quot;lat&quot;)) %&gt;% subset(!is.na(value)) ggplot(m, aes(x = lon, y = lat, fill = value)) + borders(&quot;world&quot;) + geom_tile() + scale_fill_gradientn(colors = brewer.pal(9, &quot;Blues&quot;)) + labs(title = &quot;Total moisture in deep soil layer (100-500 cm)&quot;, subtitle = format(as.Date(d), &quot;%d %B %Y&quot;)) + xlim(range(lon)) + ylim(range(lat)) + coord_fixed()
With the ggmap package we can create a nice map of a local area.
library(ggmap) loc &lt;- round(geocode(&quot;Bendigo&quot;) / 0.05) * 0.05 get_map(loc, zoom = 12) %&gt;% ggmap() + geom_tile(data = m, aes(x = lon, y = lat, fill = value), alpha = 0.5) + scale_fill_gradientn(colors = brewer.pal(9, &quot;Blues&quot;)) + labs(title = &quot;Total moisture in deep soil layer (100-500 cm)&quot;, subtitle = format(as.Date(d), &quot;%d %B %Y&quot;))
Analysing the data
For my analysis, I am interested in the time series of moisture data for a specific point on the map. The previous code slices the data horizontally over time. To create a time series we can pierce through the data for a specific coordinate. The purpose of this time series is to investigate the relationship between sewer main blockages and deep soil data, which can be a topic for a future post.
mt &lt;- data.frame(date = t, dp = moisture[as.character(loc$lon), as.character(loc$lat), ]) ggplot(mt, aes(x = t, y = dp)) + geom_line() + labs(x = &quot;Month&quot;, y = &quot;Moisture&quot;, title = &quot;Total moisture in deep soil layer (100-500 cm)&quot;, subtitle = paste(as.character(loc), collapse = &quot;, &quot;))