Sigma Online User Manual

Using Regression & CuSUM Tools

Introduction

The Regression Activity allows you to view correlations on a Site, view the CuSUM and set targets based on the CuSUM and Regression. This activity is not available from the main Activity Menu; it is only available via right click of a Site or through the Site Overview activity.

Navigation

Click on the Site in the Site Overview Activity

Click Actions in the top right of the screen to allow you to navigate to other activities within Sigma including Regression

Click on Regression

Alternatively, right-click on a Site and select Regression from the menu

Regression Correlations

On entry to the Regression function, a screen will be displayed which shows all existing Regression Correlations that have been created.

From here, it is possible to view and work with these existing correlations. Alternatively, new correlations can be created by either manually by selecting  or by letting the system automatically discover potential correlations by selecting . More details are provided below.

View Existing Correlation

To view an existing regression correlation, either right-click the image and select  or double-click the regression thumbnail.

You will now be taken to a new screen and see three tabs - Regression, CUSUM and Control Chart.

Add Correlation (Manual)

Correlations can also be setup manually by selecting   and choosing the required correlation items within the setup screen.

Drag and drop the required Dependant Data Type by selecting from the dropdown.  Options include Account, Meter, Monitoring Point, Channel.

Drag and drop the required Independent Data Type by selecting Associated Data from the dropdown.  Then choose the data you require, for example Degree Days or Occupancy.

Click on the selected Dependant Data Type, then click Refresh Chart to update the data.

Useful Tip

Once the Dependent and Independent Data Types have been set, along with the start and end date, then if the "Refresh Chart" button is pressed after selecting one of the data types then a graph at the bottom of the window is visible which shows the data for the period.

This is useful for checking that the data is available and complete for the whole period.

In the screenshot above, HH consumption data from a Meter is being plotted against Occupancy data in an Associated Data item.


FieldDetails
Name

This is the name of the correlation that will be saved.

If the "Use default name" checkbox is selected, the system will automatically generate one based on the names of the dependent and independent data types that have been selected for the Regressions.


Selecting from Data of Type

When selecting the data to use in the Regression, both a Dependent Data Type and Independent Data Type must be selected.

This drop down list allows you to select which streams of data you wish to use in the regression based on the data-sets which are available for the selected Site.


Dependant Data Type

This is the stream data which might be influenced by the independent data type. It is usually metering data relating to electricity or gas consumed or potentially electricity generated. In terms of Sigma items, this could be a meter, periodic channel, non-periodic channel or virtual meter.

This can be set by "dragging and dropping" the appropriate item from the "Available Data Type" in the list above.


Independent Data Type

This is the data which might influence the dependent data type (for example, degree days, air temperature, solar irradiance, production output etc.). In terms of Sigma items, this will usually be Associated Data and will show all the items that are available for the Site.

This can be set by "dragging and dropping" the appropriate item from the "Available Data Type" in the list above.


Dates

This sets the start and end date that the regression should be set for and the Timezone that should be used.  The regression that is initially created will be for data that falls between these dates.

Note - the dates can be updated subsequently when working with the Regression.


Interval Period

The granularity that should be used when creating the Regression - i.e. the number of unique data points that should be used. The options here are:

  • Half Hour
  • One Hour
  • One Day
  • One Week
  • One Month

For example, if a year long period is selected and an interval of One Month is used, then 12 data points will be included in the regression. Conversely, if One Week was selected, then 52 points would be included.

Note - this should be set according to the length of time the Regression is being created for. If looking at a longer period of time, then a higher interval period might be used.


Discover Correlations (Automatic)

By using this function, Sigma will automatically check the system to find any correlations with an R2 value of 0.9 or greater based on the available sets of data within the Site.

The R2 value is called the coefficient of determination and is a statistical measure of how close the data points are to the regression line. Typically, a value of 0.9 or above represents a good correlation and indicates that the two datasets are related, i.e. a 90% correlation between the variation in consumption and the influencing dataset.

Correlations will be between two data sets and those found automatically will display as follows, the first item is the dependent data type, the second is the independent data type.

Regression

Upon entering a Regression Correlation, then the Regression screen will be displayed in the context of the Chart Tab.

This is where the Dependent Data Type (y axis or vertical axis) is plotted against the Independent Data Type (X axis or horizontal axis) and we can start working with the relationship between the two datasets and determine what expected performance might be. There are a number of components in this screen, which are explained in the subsequent sections below.

Note - these Regression Correlation that is created is read only - if customisation of the regression is required (e.g. to exclude specific data points or manually set the gradient or intercept), then a new regression line need to be created. Please see Regression Lines  directly below on how to do this.

To export the regression data, click on Export .

This will create a zip file called "Regression.zip" that contains:

  • A PNG image of the graph Regression Correlation that is selected
  • A PNG image of the overview graph for the dependent and independent variable datasets
  • An Excel file containing the data for each of the data-points (as per the table shown the the "Table" tab)

Chart Tab

The graph plots the Dependent Data against the Independent Data and draws the regression line (line of best fit) based on all the data available between the dates that have been selected.

Where the points are green they are included in the regression. It is possible to exclude data points from the regression if there is a desire to remove these from determining the expected performance. Where this is the case, these would be shown in red.

Overview

  • The Overview chart shows the dependent data and independent data for the regression period.
  • Dragging your mouse over this chart will update the regression period displayed, and automatically update the Regression Period dates.
  • The Overview Graph can be minimised by clicking the Overview header bar

Details

This section shows a number of details relating to the relationship between that has been established.

  • Gradient
  • Intercept
  • Correlation Coefficient (R) - Correlation coefficients are used to measure the strength of the relationship between two variables. 
    • 1 indicates a strong positive relationship
    • -1 indicates a strong negative relationship
    • a result of zero indicates no relationship at all.
  • Coefficient of Determination (R2) -  a statistical measure of how close the data points are to the regression line and how well it can be used to assess how well the model explains and predicts future outcomes.
    • Typically, a value of 0.9 or above represents a good correlation and indicates that the two datasets are related, i.e. a 90% correlation between the variation in consumption and the influencing dataset

The gradient or intercept can be adjusted by selecting  and entering a value in the popup box that is displayed.

      

To remove the custom gradient or intercept click 

Note - when a custom gradient is set the "R" values are not available and will be shown as "N/A".

Calculate

This feature allows the dependent variable to calculated based on the Regression Correlation that has been created.

For example, if the dependent variable (x-values) represented occupancy and the independent variable (y-values) represented electricity consumption. Then entering 50 and clicking "Evaluate" would calculate what the electricity consumption should be.

Regression Period

Use the start date and end date to choose the start period and end period for the regression, either selecting the drop down boxes or the calendar icon .

Selecting "Refresh" This will restrict the data points on the graph to only those that fall within the two dates.

Regression Lines

This section allows new Regression Lines to be created, so that they can be manipulated, saved and then re-visited at any point in the future.

When initially creating a new Regression Correlation, the system will create:

  1. A Regression Line based on the name that was entered during creation. Where the system default option was used, it will create a name starting with "Original Correlation".
  2. A Regression Line called "New performance line" which is identical to the default line, but available in edit mode so it can be modified as required.

When entering the screen in the context of Regression Lines and correlations that have previously been credited, modified and saved, then these will be displayed.

Creating New Regression Lines

To create a new Regression Line, click .

This opens the new Regression Line pop up.

Here the following details can be entered or updated:

  • Name - a bespoke name for the Regression Line to facilitate each selection of the performance lines that have been created
  • Regression Type - the type of regression line that should be used, either:
    • Linear - the most commonly used type of predictive analysis through a linear approach to modelling the relationship between the dependent and independent variable datasets. This results in the creation of a 'straight' line relationship between the datasets.
    • Polynomial- a non-linear approach to modelling the relationship between the dependent and independent variable datasets, where there is still a correlation between the data but a straight line does not quite fit the trend. This results in the creation of a 'curved' line relationship between the datasets.
      • note where this type is used, the Gradient is set to "N/A"
  • Show original correlation line - a tick box to indicate whether the original correlation line between the two datsets (i.e the read only correlation) should also be displayed on the graph
  • Has upper control line - a tick box to indicate whether an upper control line should appear on the graph. This is an upper tolerance above the regression line that has been created between the two datasets.
    • Upper limit - the upper limit tolerance that should be used - can be a fixed number of kWh, a set percentage or a number of standard deviations from the mean.
  • Has lower control line a tick box to indicate whether a lower control line should appear on the graph. This is a lower tolerance below the regression line that has been created between the two datasets.
    • Lower limit - the lower limit tolerance that should be used - can be a fixed number of kWh, a set percentage or a number of standard deviations from the mean.
Editing Regression Lines

Right clicking on a Regression Line and selecting edit will show the same pop-up outline in the "Creating New Regression Lines" above and allow the same details to be updated for the existing regression line.

Removing Regression Lines

Right clicking on a Regression Line and selecting remove, will show the following confirmation pop-up.

Clicking "Yes" will remove the Regression Line.

Selected Points - Including/Excluding Data Points

It is possible to exclude specific data points from the regression correlation as you require either individually or in bulk in a variety of different ways. This is useful to exclude specific outliers that may significantly impact and skew the baseline performance.

Excluding points from the graph will recalculate the Regression Line.

This is managed via the Select Points component of the screen which can be seen on the left hand side. It can be achieved using the graph, using the time filter feature or updating the tabular data.

Using the Graph

The graph can be used by selecting individual points on the graph (holding down the CTRL key on the keyboard if wanting to select multiple) or highlighting multiple points on the graph by left clicking and dragging the mouse over the applicable data points. This will place a black border around the selected points and add each of them as a unique row in the Selected Points pane.


Clicking the "Exclude" button will then exclude the selected data points from the regression and changed them to a red colour which visually show which points are excluded.

Clicking the "Include" button will re-enable selected data points that have previously been excluded.

Clicking "Clear Selection" will de-select any points that are highlighted and listed in the Selected Points pane.

Using the Time Filter

The Time Filter feature can be used to quickly remove data points which relate to particular time periods. It is only available where an interval of one day, one hour or half hour are used.

The view available will be applicable to the interval that has been used when creating the regression. For example, if a daily interval is used, then the days of the week would be available for inclusion/exclusion. Alternatively, if one hour was used, then a grid of the hourly time bands in the context of each day would be included where each 'cell' could be included or excluded.

Using the Table Data

The list of data points can be viewed in tabular form in the "Table" tab, which is described below.

Table Tab

The screen defaults to show the data in a graph. This can be changed to show a list of all the data points on that graph in tabular form.

Click on  tab at the top of the screen to display a list of the data points.

For each data point the table provides:

  • Date (and time if applicable)
  • X value - the value of the independent variable
  • Y value - the value of the dependent variable
  • Y deviation - the difference between the expected performance based on the regression line and the actual value (e.g the regression line expects the consumption to be 50 based on the production output being 10. The actual value is 75, so the Y deviations would be 25).
    • A positive value indicates deviation above the regression line, a negative value indicates deviation below the regression line
  • Included - a tickbox which shows whether or not the data point is included in the regression. These can be selected or deselected as required.




Cumulative Sum (CuSUM)

Once the Regression Correlation has been created and the expected performance determined, then the CuSUM control chart can be used to view the performance of the dependent variable over time. More details about this can be seen on the introduction page here.

This can be accessed by clicking on either "CUSUM" links at the top of the screen:


Chart Tab

The graph shows the cumulative data based on the CuSUM period.

This is the cumulative sum of the differences between the Actual minus the predicted value based on the regression, over time (e.g the regression expects the consumption to be 50 based on the production output being 10. The actual value is 75, so the difference would be 25). This data can be seen in the Table view, below.

Clicking and dragging your mouse over the graph will zoom into the data. Clicking the "Reset Zoom" button that appears will reset the graph to what it was.

This will highlight the step change in performance over time and highlight periods where there is significant performance degradation (i.e. line on the graph goes up), or improvement (i.e. line on the graph goes down). This trend over time may otherwise be hidden in the graph that was generated to create the Regression. This gives a much better view of the performance over time, so when looking at CuSUM chart, the changes in direction of the line indicate events that have relevance to the energy consumption pattern. 

In the example above, reviewing the chart allows you to quickly see that there was a step change in performance which started from February 2019, as per the annotation below, which isn’t immediately obvious in the general correlation.The line sharply goes down after a prior consistent rising trend. This might trigger investigative action as to what changed at the start February 2019 and lead to corrective action being taken to optimise the performance. Subsequently, the same method would be used to re-assess the performance over the updated period of time and validate the performance has improved and the building is operating at an optimally. If corrective action was taken, and a a new Regression line was created to also include the new period of time, then you would expect to see the line to start going back "up" after the change shad been made to bring performance back on track.

This technique can then be used on an ongoing basis in the continuous improvements lifecycle to help identify and react to resolve issues effectively.

Overview

  • The Overview chart shows the dependent data and independent data for the CuSUM period.
  • Dragging your mouse over this chart will update the CuSUM period displayed, and automatically update the CuSUM Period dates.
    • If the CUSUM period is different to the Regression Period then you will see two colours highlighted on the overview. Blue is the Regression Period and purple the CUSUM Period.
  • The Overview Graph can be minimised by clicking the Overview header bar

Details

This section shows a number of details relating to the CuSUM that has been established.

  • CuSUM - The is the sum of the cumulative differences between the applicable dates in the CuSUM Period
  • CuSUM CO2 - The is the sum of the cumulative differences of CO2 emissions between the applicable dates in the CuSUM Period

CuSUM and CuSUM CO2 will always be 0.00 when moving straight from Regression without changing the CuSUM period. When the CuSUM period is changed then anything above the 0 value based on the regression period will be calculated and listed in both kWh's and CO2 emissions.

Data Analysis

This section allows the creation of visual overlays which can be used to determine whether the performance is out of control shown in the CuSUM graph. A V-Mask is an overlay shape in the form of a V on its side that is superimposed on the graph of the cumulative sums. The origin point of the V-Mask (see diagram below) is placed on top of the latest cumulative sum point and past points are examined to see if any fall above or below the sides of the V

Tick the Show V-Mask box to add a V-Mask target onto the CuSUM.

To confirm target creation select "Yes" in the resulting popup window

By default a truncated V-Mask will be applied, adding a green layer to the graph.

The CuSUM points that now fall above the top or bottom arms of the V-Mask will be highlighted in red and shown on the graph as "exceptions".

There are 3 types of V-Mask targets available as follows

  • Full mask 
  • Snub-nosed mask 
  • Truncated mask 

The type of V-Mask that is used can be changed after a V-Mask has been created, this is explained in the "Targets" section directly below.

Tick Show Fixed Targets to add a CuSUM fixed target. To confirm target creation select "Yes" in the resulting popup window.

CuSUM Period

Use the start date and end date to choose the start period and end period for the CuSUM, either selecting the drop down boxes or the calendar icon .

Selecting "Refresh" This will restrict the data points on the graph to only those that fall within the two dates.

Targets

This section displays any targets that have been created based on the CuSUM graph.

  • To change any of the settings, right-click on the target and select Edit.
  • To remove a target, right-click on the target and select Remove

 

If you edit a CuSUM V-Mask Target you will be presented with the following window. After changing any configuration, click "Recalculate" and the chart will be updated.

  

If you edit a CuSUM Fixed Target you will be presented with the following window. The target value can be entered as required and configured to end on a specific date. After changing any configuration, click "OK" and the chart will be updated.

  

Update the settings as required and click OK

The Targets will only create an event if the first point of data is outside the tolerance range by >5% or if there are two consecutive points of day outside the tolerance range but with the 5% range.

Table Tab

Selecting the  tab to show the data points that are plotted on the graph

The date displayed within the table is the end date for that period

The table and charts can be saved by click 

Control Chart

It is not possible to go directly from Regression to the Control Chart as it is a designed as a stepped approach. 

Navigate from Regression to the Control Chart via CuSUM by clicking  then 

Graph

The graph will display data for the same period that has been set in the CuSUM

The graph displays 

Control Lines - accepted areas of data based on the control limit

Exceptions - areas which fall outside of the control limit

Difference line - data for the period being presented

Overview

The Overview Graph highlights the current target period by default and displays historic and future data

The Overview Graph can be minimised by clicking the Overview header bar

Targets

By default a New Control Target will be created for the user to amend as required. This target is not saved unless you click the Save button

The Targets section details the current control line parameters as well as allowing you to add and remove targets. 

Control Lines

To amend the control lines applied change the control limit

Alternatively you can manually drag the control lines on the graph, to do this select  and click and drag on the graph to set desired control limits

To remove the Control Limits from the graph uncheck .

The Targets will only create an event if the first point of data is outside the tolerance range by >5% or if there are two consecutive points of day outside the tolerance range but with the 5% range.


@ Copyright TEAM - Energy Auditing Agency Limited Registered Number 1916768 Website: www.teamenergy.com Telephone: +44 (0)1908 690018