This chapter of the Data Science for Utilities is an introduction to machine learning using multiple linear regression and decision trees.

Introduction to Machine Learning

Peter Prevos

Peter Prevos |

383 words | 2 minutes

Share this content

Machine learning is an approach to statistical analysis whereby a computer detects and then uses the results to predict outcomes from new data. The combination of large amounts of available data becoming cheaply available and open source machine learning algorithms is causing a revolution in many industries, including water management. For example, water utilities can apply machine learning to predict which water or sewer main is likely to fail soon. This chapter of Data Science for Water Utilities introduces the principles of machine learning and the basic modelling process. The learning objectives for this chapter are:

  • Understand the principles of machine learning
  • Apply cross-validation to linear regression
  • Implement a decision tree prediction

Data Science for Water Utilities

Data Science for Water Utilities

Data Science for Water Utilities published by CRC Press is an applied, practical guide that shows water professionals how to use data science to solve urban water management problems using the R language for statistical computing.

The data and code used in this chapter are available on GitHub:

Principles of Machine Learning

Machine learning is an area of study in computer science and statistics that involves developing algorithms that can automatically learn patterns from data and make predictions or decisions, without being explicitly programmed to do so.

We have already seen linear regression, which is a form of supervised learning. In this type of machine learning, The model is trained on a dataset that contains both input and output variables. Cluster analysis belongs to the unsupervised leanring category. In these types of problems, the computer seeks dor structure in the data without independent variable.

Family tree of machine learning
Family tree of machine learning.

This chapter introduces multiple linear regression and decision trees.

Cross Validation and Fitting

Fitting a machine-learning model
Fitting a machine-learning model.

Multiple Linear Regression

Concrete mixture decision tree
Concrete mixture decision tree.

Comparing models

RMSE

Decision Tree

Concrete mixture decision tree
Concrete mixture decision tree.

Comparing models

Confusion matrix

Introduction to Machine Learning Screencast

Chapter thirteen of Data Science for Water Utilities explains the theory and application of machine learning in more detail. This screencast below reviews the code for this chapter.

Introduction to Machine Learning Screencast.

The data and code used in this chapter are available on GitHub:

Additional Resources

Other Chapters

Previous Chapter: Detecting Outliers and Anomalies

Feel free to contact me if you have any comments, suggestions or questions about this book.

Share this content

You might also enjoy reading these articles

Analysing the Customer Experience

Basic Linear Regression

Basics of the R Language