Geocoding with ggmap and the Google API

Some of the most popular articles on the Devil is in the Data show how to visualise spatial data creatively. In the old days, obtaining latitude and longitude required a physical survey, with Google maps, this has become a lot easier.

The geocode function from the ggmap package extracts longitude and latitude from Google maps, based on a location query. The example below shows the result of a simple geocoding call for the White House and Uluru. The geocode function essentially constructs a URL to obtain the data.

In the middle of 2018, Google recently tightened access to the database, which means you need to register an API for it to work. This article explains how to use the latest version of ggmap and a Google account to continue using this function.

The Google API

Before we can start geocoding, we need to obtain an API key from Google. Go to the registration page, and follow the instructions (select all mapping options). The geocoding API is a free service, but you nevertheless need to associate a credit card with the account.

Please note that the Google Maps API is not a free service. There is a free allowance of 40,000 calls to the geocoding API per month, and beyond that calls are $0.005 each.

Geocoding with ggmap

You will need to ensure that you have the latest version of ggmap installed on your system. The current version on CRAN is 3.0.

The code snippet below shows a minimum-working-example of how you can map coordinates using ggplot. The register_google function stores the API key. I have stored the key itself in a private text file. The getOption("ggmap") function summarises the Google credentials to check how you are connected.

The geocode function converts the request into a URL and captures the output into a data frame. The plot shows the places I have lived, projected orthogonally on the globe.

Geocoding with ggmap and the Google API

Map Porn

The articles on this blog that rely on the geocode function are categorised as Map Porn because they mostly discuss having fun with maps in R. All code of these articles has been amended to function with the new method.

Flat Earth Mathematics with examples in the R Language

In September I am embarking on a trip around the world to speak at the 2018 World Water Congress in Tokyo, visit family and friends in the Netherlands and some tourism in San Francisco. One of the workshops I speak at is about Fake News in the water industry. There are many fake stories about tap water in circulation on the web, mainly related to the use of chlorine and fluoride. The internet provides humanity with almost unlimited knowledge, but instead, they use it to spread conspiracy theories. One of the craziest fake news trends is the flat earth conspiracy. Some of you might ask whether my trip actually around the world or am I travelling across a flat disk?

This article discusses flat earth mathematics, and how to convert and visualise map projections in R. This article uses the code I published earlier in articles about creating flight maps and Pacific island hopping. You can view the code below or download the full version from GitHub.

The Flat Earth

YouTube contains thousands of videos from people that claim the earth is flat. These people pontificate that science as we know it is a “globalist conspiracy”. While their claims are hilarious, people that believe our planet is flat are often passionate and even conduct simple experiments and engage in naive flat earth mathematics.

Adherents to the idea that the world is flat often propose Gleason’s Map as their version of the correct representation of our planet. This map is not a map of the flat earth but a polar Azimuthal equidistant projection of the globe. Even though the map itself states that it is a projection of the world, flat earth believers nevertheless see it as a literal representation of the world they live on.

Gleason's Map is often touted as a map of the flat earth.

Their belief is based on an appeal to common sense as they can see that the world is flat with their own eyes. The second ingredient is a deep distrust in science, often inspired by religious motives.

This article shows two different ways to look at the earth and show that the spherical model better fits the reality of my trip around the world.

Projecting the spherical earth on a flat surface is a complex task which will always require compromise as it is impossible to truthfully draw the surface of the globe on a piece of paper. The video below provides a practical introduction to map projections that show how maps are stretched to display them on a flat surface.

We can recreate Gleason’s map with ggplot, which incorporates the mapproj package to show maps in various projections. The Azimuthal equidistant method is popularised in the flag of the United Nations. Antarctica is in this projection displayed as a ring around the world. Flat earth evangelists believe that the South Pole a ring of ice that prevents us from proceeding beyond the disc. The edge of this disc cannot be shown because this method projects the south pole at an infinite distance from the centre of the map.

The coord_map function allows you to change projections. The orientation argument defines the centre point of the projection. The most common location is the North Pole (90, 0). The third value gives the clockwise rotation in degrees. Gleason’s map uses a rotation of 270 degrees.

Polar azimuthal equidistant projection with ggmap.

Polar azimuthal equidistant projection with ggmap.

My Round-the-World Itinerary

My trip will take me from Australia to Japan, from where I go to the Netherlands. The last two legs will bring me back to Australia via San Francisco.

The itinerary is stored in a data frame, and the ggmap package geocodes the longitude and latitude of each of the locations on my trip. You will need a Google API to enable the geocoding function.

As the earth is a sphere, an intermediate point needs to be added for trips that pass the dateline, as I explained in my article about flight maps.

I visualised the itinerary using the same method as above but centring on Antarctica. The geosphere package helps to estimate the total travel distance, which is approximately 38,995 km, slightly less than a trip around the equator. This distance is the great circle distance, which is the shortest distance between two points on a sphere, adjusted for a spheroid (a flattened sphere).

The flight paths on this map are curved because of the inevitable distortions when projecting a sphere on a flat surface.

Round the World Itinerary

Round the world trip in polar Azimuthal equidistant projection.

Round the world trip in polar Azimuthal equidistant projection.

Flat Earth Mathematics

If the Gleason map were an actual map of the flat earth, then the flight paths on the map would show as straight lines.

The mapproj package contains the mapproject function that calculates the projected coordinates based on longitude and latitude. The output of this function is a grid with limits from -\pi to \pi. The first part of the code converts the longitude and latitude from the world data frame to projected coordinates.

A line from the lon/lat 0,0 to the north pole has a projected distance of \pi/2, which in the spherical world is \pi / 2 * 6378.137 = 10018.75 km. We need to multiply the Euclidean distances with the radius of the Earth to derive the Gleason map coordinates.

This last code snippet converts the world map to flat earth coordinates. It calculates the Euclidean distance between the points on the itinerary and multiplies this with the Earth’s diameter.

This last code snippet shows why the Gleason map is not a map of a flat earth. On this map, the shortest distance between Sydney and Santiago de Chili is about 25,000 km, more than twice the real value. The real travel time is about 14 hours, which implies that passenger jets break the sound barrier. This problem exists for journeys along the lines of latitude in the Southern hemisphere. The distortion in this projection increases with the distance from the centre of the map.

This map looks like the first one, but the coordinate system is now Euclidean instead of longitudes and latitudes, as indicated by the square grid. On a projected map, the shortest distance is a curved line, parallel to Antarctica, which is how ships and aeroplanes move between these cities.

Sydney to Santiago de Chili on a flat earth map.

Sydney to Santiago de Chili on a flat earth map.

This article proves that the Gleason map is not a representation of a flat earth. Aeroplanes would have to break the sound barrier to fly these distances in the time it takes to travel. Whichever way you project the globe on a flat map will lead to inevitable distortions. The Gleason map itself mentions that it is a projection.

However, these facts will not dissuade people from believing in a flat earth. I am after all an engineer and thus part of the globalist science conspiracy.

The Flat Earth Mathematics Code

You can view the complete code on my GitHub repository.

Analysing soil moisture data in NetCDF format with the ncdf4 library

The netCDF format is popular in sciences that analyse sequential spatial data. It is a self-describing, machine-independent data format for creating, accessing and sharing array-oriented information. The netCDF format provides spatial time-series such as meteorological or environmental data. This article shows how to visualise and analyse this data format by reviewing soil moisture data published by the Australian Bureau of Statistics. The latest version of this code is available on my GitHub repository.

Soil Moisture data

The Australian Bureau of Meteorology publishes hydrological data in both a simple map grid and in the NetCDF format. The map grid consists of a flat text file that requires a bit of data jujitsu before it can be used. The NetCDF format is much easier to deploy as it provides a three-dimensional matrix of spatial data over time.

We are looking at the possible relationship between sewer main blockages and deep soil moisture levels. You will need to manually download this dataset from the Bureau of Meteorology website. I have not been able to scrape the website automatically. For this analysis, I use the actual deep soil moisture level, aggregated monthly in NetCDF 4 format.

Australian Landscape Water Balance

Australian Landscape Water Balance (Bureau of Meteorology).

Reading, Extracting and Transforming the netCDF format

The ncdf4 library, developed by David W. Pierce, provides the necessary functionality to manage this data. The first step is to load the data, extract the relevant information and transform the data for visualisation and analysis. When the data is read, it essentially forms a complex list that contains the metadata and the measurements.

The ncvar_get function extracts the data from the list. The lon, lat and dates variables are the dimensions of the moisture data. The time data is stored as the number of days since 1 January 1900. The spatial coordinates are stored in decimal degrees with 0.05-decimal degree intervals. The moisture data is a three-dimensional matrix with longitue, latitude and time as dimensions. Storing this data in this way will make it very easy to use.

Visualising the data

The first step is to check the overall data. This first code snippet extracts a matrix from the cube for 31 July 2017 and plots it. This code pipe extracts the date for the end of July 2017 and creates a data frame which is passed to ggplot for visualisation. Although I use the Tidyverse, I still need reshape2 because the gather function does not like matrices.

Deep soil moisture: Source Bureau of Meteorology, Australia

Deep soil moisture: Source Bureau of Meteorology, Australia

With the ggmap package we can create a nice map of a local area.

Soil Moisture aroud Bendigo

Analysing the data

For my analysis, I am interested in the time series of moisture data for a specific point on the map. The previous code slices the data horizontally over time. To create a time series we can pierce through the data for a specific coordinate. The purpose of this time series is to investigate the relationship between sewer main blockages and deep soil data, which can be a topic for a future post.

Deep soil moisture time series.

Soil moisture time series for Bendigo.

Pacific Island Hopping using R and iGraph

Last month I enjoyed a relaxing holiday in the tropical paradise of Vanuatu. One rainy day I contemplated how to go island hopping across the Pacific ocean visiting as many island nations as possible. The Pacific ocean is a massive body of water between, Asia and the Americas, which covers almost half the surface of the earth. The southern Pacific is strewn with island nations from Australia to Chile. In this post, I describe how to use R to plan your next Pacific island hopping journey. View the latest version of this code on GitHub.

Pacific Island hopping

The Pacific Ocean.

Listing all airports

My first step was to create a list of flight connections between each of the island nations in the Pacific ocean. I am not aware of a publically available data set of international flights so unfortunately, I created the list manually by collecting route maps on Pinterest (if you do know of a fomal data set with international flights, then please leave a comment).

My manual research resulted in a list of international flights from or to island airports. This list might not be complete, but it is a start. My Pinterest board with Pacific island airline route maps was the information source for this list.

The first code section reads the list of airline routes and uses the ggmap package to extract their coordinates from Google maps. You will need a Google API to enable the geocoding function. The data frame with airport coordinates is saved for future reference to avoid repeatedly pinging Google for the same information.

Create the map

To create a map, I modified the code to create flight maps I published in an earlier post. This code had to be changed to centre the map on the Pacific.

Mapping the Pacific ocean is problematic because the -180 and +180 degree meridians meet around the dateline. Longitudes west of the antemeridian are positive, while longitudes east are negative. The world2 data set in the borders function of the ggplot2 package is centred on the Pacific ocean. To enable plotting on this map, all negative longitudes are made positive by adding 360 degrees to them and defining the antipode.

 

Pacific island hopping - flight route map.

Pacific island hopping – flight route map.

Pacific Island Hopping

This visualisation is aesthetic and full of context, but it is not the best visualisation to solve the travel problem. This map can also be expressed as a graph with nodes (airports) and edges (routes). Once the map is represented mathematically, we can generate travel routes and begin our Pacific Island hopping.

The igraph package converts the flight list to a graph that can be analysed and plotted. The shortest_path function can then be used to plan routes. If I would want to travel from Auckland to Saipan in the Northern Mariana Islands, I have to go through Port Vila, Honiara, Port Moresby, Chuuk, Guam and then to Saipan. I am pretty sure there are quicker ways to get there, but this would be an exciting journey through the Pacific.

Pacific island hopping - network model

Pacific island hopping – network model

Visualising Water Consumption using a Geographic Bubble Chart

A geographic bubble chart is a straightforward method to visualise quantitative information with a geospatial relationship. Last week I was in Vietnam helping the Phú Thọ Water Supply Joint Stock Company with their data science. They asked me to create a map of a sample of their water consumption data. In this post, I share this little ditty to explain how to plot a bubble chart over a map using the ggmap package.

In this post, I share this little ditty to explain how to plot a bubble chart over a map using the ggmap package. You can find the code and data for this article on my GitHub repository. With thanks to Ms Quy and Mr Tuyen of Phu Tho water for their permission to use this data. Other posts on this blog detail how to analyse water consumption from digital metering data.

This map visualises water consumption in the targeted area of Việt Trì. The larger the bubble, the larger the consumption. It is no surprise that two commercial customers used the most water. Ggplot automatically adds the legend for the consumption variable.

Geographic Bubble Chart: Visualising Water Consumption in Vietnam.

Load and Explore the Data

The sample data contains a list of just over 100 readings from water meters in the city of Việt Trì in Vietnam, plus their geospatial location. This data uses the World Geodetic System of 1984 (WGS84), which is compatible with Google Maps and similar systems.

The consumption at each connection is between 0 and 529 cubic metres, with an average usage of 23.45 cubic metres.

Visualise the data with a geographic bubble chart

With the ggmap extension of the ggplot package, we can visualise any spatial data set on a map. The only condition is that the spatial coordinates are in the WGS84 datum. The ggmap package adds a geographical layer to ggplot by adding a Google Maps or Open Street Map canvas. The first step is to download the map canvas. To do this, you need to know the centre coordinates and the zoom factor. To determine the perfect zoon factor requires some trial and error. The ggmap package provides for various map types, which are described in detail in the documentation.

The ggmap package follows the same conventions as ggplot. We first call the map layer and then add any required geom. The point geom creates a nice bubble chart when used in combination with the scale_size_area option. This option scales the points to a maximum size so that they are easily visible. The transparency (alpha) minimises problems with overplotting. This last code snippet plots the map with water consumption.

Euler Problem 19: Counting Sundays — When does the week start?

Euler Problem 19 is so trivial it is almost not worth writing an article about. One interesting aspect of this problem is the naming of weekdays and deciding which day the week starts. This issue is more complex than it sounds because data science is in essence not about data but about people.

Euler Problem 19 Definition

  • 1 Jan 1900 was a Monday.
  • Thirty days has September, April, June and November.
  • All the rest have thirty-one,
  • Saving February alone, Which has twenty-eight, rain or shine. And on leap years, twenty-nine.
  • A leap year occurs on any year evenly divisible by 4, but not on a century unless it is divisible by 400.

How many Sundays fell on the first of the month during the twentieth century (1 Jan 1901 to 31 Dec 2000)?

Solution

The problem can be quickly solved with R base code and a tiny bit faster when using the lubridate package.

To draw out this post a little bit further, I wrote some code to solve the problem without using any of the calendar functions in R.

You can download the latest version of this code from GitHub.

Which day does the week start?

The only aspect remotely interesting about this problem is the conversion from weekdays to numbers. In R, Monday is considered day one, which makes sense in the Christian context of Western culture. Saturday and Sunday are the weekends, the two last days of the week so they are day 6 and 7. According to international standard ISO 8601, Monday is the first day of the week. Although this is the international standard, several countries, including the United States and Canada, consider Sunday to be the first day of the week.

The international standard is biased towards Christianity. The Christian or Western world marks Sunday as their day of rest and worship. Muslims refer to Friday as their day of rest and prayer. The Jewish calendar counts Saturday—the Sabbath—as the day of rest and worship. This idea is also shared by Seventh-Day Adventists.

First day of the week around the world

First day of the week around the world

This Euler problem shows that data science is not only about data: it is always how people interpret the world.

 

Create Air Travel Route Maps in ggplot: A Visual Travel Diary

I have been lucky to fly to a few countries around the world. Like any other bored traveller, I thumb through the airline magazines and look at the air travel route maps. These maps are beautifully stylised depictions of the world with gently curved lines between the many destinations serviced by the airline. I always wanted such a map for my own travel adventures. In this article I explain how to create a map of your own travels in the style of the Emirates Airlines route map.

Create Air Travel Route Maps using ggplot2

The first step was to create a list of all the places I have flown between at least once. Paging through my travel photos and diaries, I managed to create a pretty complete list. The structure of this document is simply a list of all routes (From, To) and every flight only gets counted once. The next step finds the spatial coordinates for each airport by searching Google Maps using the geocode function from the ggmap package. You will need a Google API to enable the geocoding function.

In some instances, I had to add the country name to avoid confusion between places. To prevent errors from the Google maps API, I have added a while loop that runs until all destinations have been geocoded.

We now we have a data frame of airports with their coordinates and can create air travel route maps. The data frames are merged so that we can create air travel route maps using the curve geom. The borders function of ggplot2 creates the map data. The ggrepel package helps to prevent overplotting of text. This code also removes any return flights and splits flights that crossed the date line.

My personal Air Travel Route Maps in ggplot: A Visual Travel Diary

My personal Air Travel Route Maps in ggplot: A Visual Travel Diary

You can view the recent version of the code and associated files in GitHub. In another post I have used the same principle to create a route map of flights between islands in the Pacific Ocean using the schedules from several international airlines.

Mapping antipodes using the ggmap package

When I was a kid, I was was fascinated by the conundrum of what happens when you drill a hole straight through the centre of the earth. I always believed that I would turn up in Australia. But is this really the case?

The antipodes of any place on earth is the place that is diametrically opposite to it. A pair of antipodes are connected by a straight line running through the centre of the Earth. These points are as far away from each other as is possible on this planet. Two people are antipodes when they live on opposite sides of the globe. Their feet (πούς/pous in Ancient Greek) are directly opposite each other.

How can we use coding in R to solve this conundrum?

Mapping antipodes with ggmap

We can roughly recreate the antipodean map on Wikipedia with the globe package. This package, written by Adrian Baddeley, plots 2D and 3D views of the earth. The package contains a data file with major coastlines that can be used to create a flipped map of the world.

The package contains a data file with major coastlines that can be used to create a flipped map of the world. To turn a spatial location into its antipode you subtract 180 degrees from the longitude and reverse the sign of the latitude, shown below.

Antipodean globe.

Antipodean globe.

We can also use the ggmap package to visualise antipodes. This package, developed by David Kahle antipodean R-guru Hadley Wickham, has a neat geocoding function to obtain a spatial location. You will need a Google API to enable the geocoding function.

The antipode function takes the description of a location and a zoom level to plot a dot on the antipode location. The gridExtra package is used to create a faceted map, which is not otherwise possible in ggmap.

This code solves the problem I was thinking about as a child. Running the code shows that the antipodes location of the home I grew up in is not in Australia, but quite a way south of New Zealand. Another childhood fantasy shattered … You can also view this code on GitHub.

Mapping antipodes: Antipode map of my birthplace, Hoensbroek, the Netherlands.

Antipode map of my birthplace, Hoensbroek, the Netherlands.