International keynote at the Asset Data & Insights Conference in Auckland, 26 July 2017. Using virtual tags to create value from SCADA data.

How Virtual Tags have transformed SCADA data analysis

Peter Prevos

Peter Prevos |

906 words | 5 minutes

Share this content

Yesterday, I delivered the International Keynote at the Asset Data & Insights Conference in Auckland, New Zealand (the place where R was initially developed). My talk was about creating value from CADA data using a method I developed with my colleagues called Virtual Tags. My address started with my views on data science strategy, which I also presented to the R User Group in Melbourne. In this post, I like to explain what Virtual Tags are and how they can be used to improve the value of SCADA data.

SCADA Systems at Water Treatment Plants

Water treatment plants are mostly fully automated, using analysers and the SCADA system to communicate this data. For those of you not familiar with water treatment plants, this video below gives a cute summary of the process.

Water Treatment — SCADA Plant IQ

Water treatment plants need sensors to measure a broad range of parameters. These instruments record data 24 hours per day to control operations. When the process operates effectively, all values fall within a narrow band. All these values are typically stored by the SCADA system for a year, after which they are destroyed to save storage space.

Water treatment plants measure turbidity (clarity of the water) to assess the effectiveness of filtration. The code snippet below simulates the measurements from a turbidity instrument at a water treatment plant over five hours. The code simulates measurements from a turbidity instrument at a water treatment plant over five hours. Most water quality data has a log-normal distribution with a narrow standard deviation.

  # Virtual tag simulation

  # Generate turbidity measurements
  set.seed(1234)
  n <- 300
  wtp <- data.frame(timestamp = seq.POSIXt(ISOdate(1910, 1, 1), length.out = n, by = 60),
                    turbidity = rlnorm(n, log(.1), .01))

  library(ggplot2)

  p1 <- ggplot(wtp, aes(x = timestamp, y = turbidity)) +
    geom_line(colour = "grey") +
    ylim(0.09, 0.11) + 
    theme_bw(base_size = 10) + 
    labs(title = "Turbidity simulation", x = "Timestamp", y = "Turbidity")

  p1
Turbidity simulation
Turbidity simulation.

SCADA Historian

The data generated by the SCADA system is used to take operational decisions. The data is created and structured to make decisions in the present, not to solve problems in the future. SCADA Historian systems archive this information for future analysis. Historian systems only store new values when the new reading is more or less than a certain percentage of the previous one. This method saves storage space without sacrificing much accuracy.

For example, when an instrument reads 0.20, and the limit is 5%, new values are only recorded below 0.19 or above 0.21. Any other values are stored when they deviate 5% from the new value, and so on. The code snippet below simulates this behaviour based on earlier simulated turbidity readings. This Historian only stores the data points marked in black.

  # Historise using dead banding
  threshold <- 0.03
  h <- 1 # First historised point

  wtp$historised <- FALSE
  wtp$historised[c(1, n)] <- TRUE # Testing for delta > threshold

  for (i in 2:nrow(wtp)) {
    delta <- wtp$turbidity[i] / wtp$turbidity[h] 
    if (delta > (1 + threshold) | delta < (1 - threshold)) {
      wtp$historised[i] <- TRUE
      h <- i
    }
  }

  p2 <- p1 + 
    geom_point(data = subset(wtp, historised), 
               aes(x = timestamp, y = turbidity), 
               size = 3, alpha = .5, color = "blue") +
    labs(title = "Historised data")

  p2
Historised simulated turbidity data
Historised simulated turbidity data.

Virtual Tags

This standard method to generate and store SCADA data works fine to operate systems but does not work well when using the data for post hoc analysis. The data in Historian is an unequally-spaced time series, making it harder to analyse. Using constant interpolation, the Virtual Tag approach expands these unequal time series to an equally-spaced one.

The vt() function undertakes the constant interpolation using the approx function. The function vt is applied to all the DateTime values using the historised data points. The red line shows how the value is constant until it jumps by more than 5%. This example demonstrates that we have a steady process with some minor spikes, which is the expected outcome of this simulation.

  # Virtual Tags extrapolation

  vt <- function(t) {
    approx(historian$timestamp, historian$turbidity, xout = t, method = "constant")
  }

  turbidity <- lapply(as.data.frame(wtp$timestamp), vt)

  wtp$virtual_tag <- turbidity[[1]]$y

  p3 <- p2 + geom_line(data = wtp, aes(x = timestamp, y = virtual_tag), colour = "red") +
    ggtitle("Virtual Tags")
  p3
Virtual tags for the simulated data
Virtual tags for the simulated data.

The next step in Virtual Tags is to combine the tags from different data points. For example, we were only interested in the turbidity readings when the filter ran. We can combine this data with the filter's valve status or flow.

This approach might seem cumbersome, but it simplifies analysing data from SCADA Historian. Virtual Tags enable a catalogue of analytical processes that would otherwise be hard to do. This system also adds context to the SCADA information by linking tags to each other and the processes they describe. If you are interested in more detail, then please download the technical manual for Virtual Tags and how they are implemented in SQL.

If you want to learn more about using R code to solve water problems, look at the book Data Science for Water Utilities.

Data Science for Water Utilities

Data Science for Water Utilities

Data Science for Water Utilities published by CRC Press is an applied, practical guide that shows water professionals how to use data science to solve urban water management problems using the R language for statistical computing.

Share this content

You might also enjoy reading these articles

Monte Carlo Cost Estimates: Engineers Throwing Dice

Cheesecake Diagrams: Pie Charts with a Different Flavour

Factor Analysis in R: Measuring Consumer Involvement