This Data Science for Water Utilities chapter implements cluster analysis to segment customers using hierarchical clustering and k-means.

Clustering Customers to Define Segments

Peter Prevos

Peter Prevos |

672 words | 4 minutes

Share this content

The ideal form of customer service is personal attention, where the needs of each individual are met. Unfortunately, this level of service is more often than not impossible or too costly to achieve. Service providers, therefore, segment their customers into groups with similar characteristics. Cluster analysis to segment customers is a commonly used technique, which analyses and divides an unlabeled dataset into groups of observations with similar properties. This chapter of Data Science for Water Utilities shows how to detect patterns and define segments in customer data. The learning objectives for this chapter are:

  • Understand the principles of customer segmentation
  • Apply and interpret hierarchical cluster analysis
  • Apply and interpret k-means clustering

Data Science for Water Utilities

Data Science for Water Utilities

Data Science for Water Utilities published by CRC Press is an applied, practical guide that shows water professionals how to use data science to solve urban water management problems using the R language for statistical computing.

The data and code used in this chapter are available on GitHub:

Principles of Customer Segmentation

The ideal situation for customer-centric services is that each customer receives individual attention. For large organisations, giving each customer individual attention is very costly and treating everybody the same is not very good either. Customer segmentation helps service providers group customers into segments with similar needs.

  • Demographic: Age, gender, income, education, ethnicity.
  • Behavioural: Purchasing habits, spending habits, water consumption.
  • Psychographic: Interests, lifestyle, motivations, and water-related priorities.
  • Geographic: Town, postal code, water system.

Hierarchical Cluster Analysis

This example contains data from ten hypothetical customers (A–J). The first data dimension in the test data is the average annual water consumption, and the second is the size of the land on which the house resides. The clusters should be easily visible by viewing the image.

Cluster analysis example
Cluster analysis example.

Hierarchical cluster analysis is a deterministic method to find the relevant clusters. This method reviews all possible combinations of data points and can thus be problematic when analysing large amounts of data. This tree diagram shows how all the points in the chart relate to each other.

Hierarchical clustering example
Hierarchical clustering example.

k-means clustering

The k-means method uses a stochastic approach, which means that the outcome is not always the same when some clusters are in doubt. But, this method can digest much larger data sets than the hierarchical method. Another difference with the first method is that in k-means, you need to specify the number of clusters before the analysis starts.

The elbow method visualises the Within-Clusters Sum of Squares for the number of clusters. The location where the graph has the smallest angle is most likely the ideal number of clusters.

k-Means cluster analysis elbow method
k-Means cluster analysis elbow method.

Interpreting cluster analysis

Clustering analysis methods are a form of unsupervised machine learning. The computer detects patterns in the data but cannot relate them to meaning. The results of a cluster analysis require human interpretation to link it to the context of the data.

In this simple example, we could name the two clusters of households with and those without a garden.

In reality, cluster analysis occurs with many more variables, as in these simplified examples. The book provides a small case study that also uses categorical data.

Cluster Analysis to Segment Customers Screencast

Chapter ten of Data Science for Water Utilities explains the principles of cluster analysis for customer segmentation in more detail. This screencast demonstrates how to undertake cluster analysis to segment customers using the code explained in the book.

Clustering Customers to Define Segments.

The data and code used in this chapter are available on GitHub:

Additional Resources

My book Customer Experience Management for Water Utilities by IWA publishing delves deeper into using marketing theory for urban water supplies.

Customer Experience Management for Water Utilities: Marketing Urban Water Supply

Customer Experience Management for Water Utilities: Marketing Urban Water Supply

Practical framework for water utilities to become more focused on their customers following Service-Dominant Logic.

Other Chapters

Previous Chapter: Basic Linear Regression

Next Chapter: Working with Dates and Times

Feel free to contact me if you have any comments, suggestions or questions about this book.

Share this content

You might also enjoy reading these articles

Analysing the Customer Experience

Basic Linear Regression

Basics of the R Language