Sigma Online User Manual
Using Regression & CuSUM Tools
Introduction
The Regression Activity allows you to view correlations on a Site, view the CuSUM and set targets based on the CuSUM and Regression. This activity is not available from the main Activity Menu; it is only available via right click of a Site or through the Site Overview activity.
Navigation
Click on the Site in the Site Overview Activity
Click Actions in the top right of the screen to allow you to navigate to other activities within Sigma including Regression
Click on Regression
Alternatively, right-click on a Site and select Regression from the menu
Regression Correlations
On entry to the Regression function, a screen will be displayed which shows all existing Regression Correlations that have been created.
From here, it is possible to view and work with these existing correlations. Alternatively, new correlations can be created by either manually by selecting or by letting the system automatically discover potential correlations by selecting . More details are provided below.
View Existing Correlation
To view an existing regression correlation, either right-click the image and select or double-click the regression thumbnail.
You will now be taken to a new screen and see three tabs - Regression, CUSUM and Control Chart.
Add Correlation (Manual)
Correlations can also be setup manually by selecting and choosing the required correlation items within the setup screen.
Drag and drop the required Dependant Data Type by selecting from the dropdown. Options include Account, Meter, Monitoring Point, Channel.
Drag and drop the required Independent Data Type by selecting Associated Data from the dropdown. Then choose the data you require, for example Degree Days or Occupancy.
Click on the selected Dependant Data Type, then click Refresh Chart to update the data.
Useful Tip
Once the Dependent and Independent Data Types have been set, along with the start and end date, then if the "Refresh Chart" button is pressed after selecting one of the data types then a graph at the bottom of the window is visible which shows the data for the period.
This is useful for checking that the data is available and complete for the whole period.
In the screenshot above, HH consumption data from a Meter is being plotted against Occupancy data in an Associated Data item.
Field | Details | |
---|---|---|
Name | This is the name of the correlation that will be saved. If the "Use default name" checkbox is selected, the system will automatically generate one based on the names of the dependent and independent data types that have been selected for the Regressions. | |
Selecting from Data of Type | When selecting the data to use in the Regression, both a Dependent Data Type and Independent Data Type must be selected. This drop down list allows you to select which streams of data you wish to use in the regression based on the data-sets which are available for the selected Site. | |
Dependant Data Type | This is the stream data which might be influenced by the independent data type. It is usually metering data relating to electricity or gas consumed or potentially electricity generated. In terms of Sigma items, this could be a meter, periodic channel, non-periodic channel or virtual meter. This can be set by "dragging and dropping" the appropriate item from the "Available Data Type" in the list above. | |
Independent Data Type | This is the data which might influence the dependent data type (for example, degree days, air temperature, solar irradiance, production output etc.). In terms of Sigma items, this will usually be Associated Data and will show all the items that are available for the Site. This can be set by "dragging and dropping" the appropriate item from the "Available Data Type" in the list above. | |
Dates | This sets the start and end date that the regression should be set for and the Timezone that should be used. The regression that is initially created will be for data that falls between these dates. Note - the dates can be updated subsequently when working with the Regression. | |
Interval Period | The granularity that should be used when creating the Regression - i.e. the number of unique data points that should be used. The options here are:
For example, if a year long period is selected and an interval of One Month is used, then 12 data points will be included in the regression. Conversely, if One Week was selected, then 52 points would be included. Note - this should be set according to the length of time the Regression is being created for. If looking at a longer period of time, then a higher interval period might be used. |
Discover Correlations (Automatic)
By using this function, Sigma will automatically check the system to find any correlations with an R2 value of 0.9 or greater based on the available sets of data within the Site.
The R2 value is called the coefficient of determination and is a statistical measure of how close the data points are to the regression line. Typically, a value of 0.9 or above represents a good correlation and indicates that the two datasets are related, i.e. a 90% correlation between the variation in consumption and the influencing dataset.
Correlations will be between two data sets and those found automatically will display as follows, the first item is the dependent data type, the second is the independent data type.
Regression
Upon entering a Regression Correlation, then the Regression screen will be displayed in the context of the Chart Tab.
This is where the Dependent Data Type (y axis or vertical axis) is plotted against the Independent Data Type (X axis or horizontal axis) and we can start working with the relationship between the two datasets and determine what expected performance might be. There are a number of components in this screen, which are explained in the subsequent sections below.
Note - these Regression Correlation that is created is read only - if customisation of the regression is required (e.g. to exclude specific data points or manually set the gradient or intercept), then a new regression line need to be created. Please see Regression Lines directly below on how to do this.
To export the regression data, click on Export .
This will create a zip file called "Regression.zip" that contains:
- A PNG image of the graph Regression Correlation that is selected
- A PNG image of the overview graph for the dependent and independent variable datasets
- An Excel file containing the data for each of the data-points (as per the table shown the the "Table" tab)
Chart Tab
The graph plots the Dependent Data against the Independent Data and draws the regression line (line of best fit) based on all the data available between the dates that have been selected.
Where the points are green they are included in the regression. It is possible to exclude data points from the regression if there is a desire to remove these from determining the expected performance. Where this is the case, these would be shown in red.
Overview
- The Overview chart shows the dependent data and independent data for the regression period.
- Dragging your mouse over this chart will update the regression period displayed, and automatically update the Regression Period dates.
- The Overview Graph can be minimised by clicking the Overview header bar
Details
This section shows a number of details relating to the relationship between that has been established.
- Gradient
- Intercept
- Correlation Coefficient (R) - Correlation coefficients are used to measure the strength of the relationship between two variables.
- 1 indicates a strong positive relationship
- -1 indicates a strong negative relationship
- a result of zero indicates no relationship at all.
- Coefficient of Determination (R2) - a statistical measure of how close the data points are to the regression line and how well it can be used to assess how well the model explains and predicts future outcomes.
- Typically, a value of 0.9 or above represents a good correlation and indicates that the two datasets are related, i.e. a 90% correlation between the variation in consumption and the influencing dataset
The gradient or intercept can be adjusted by selecting and entering a value in the popup box that is displayed.
To remove the custom gradient or intercept click
Note - when a custom gradient is set the "R" values are not available and will be shown as "N/A".
Calculate
This feature allows the dependent variable to calculated based on the Regression Correlation that has been created.
For example, if the dependent variable (x-values) represented occupancy and the independent variable (y-values) represented electricity consumption. Then entering 50 and clicking "Evaluate" would calculate what the electricity consumption should be.
Regression Period
Use the start date and end date to choose the start period and end period for the regression, either selecting the drop down boxes or the calendar icon .
Selecting "Refresh" This will restrict the data points on the graph to only those that fall within the two dates.
Regression Lines
This section allows new Regression Lines to be created, so that they can be manipulated, saved and then re-visited at any point in the future.
When initially creating a new Regression Correlation, the system will create:
- A Regression Line based on the name that was entered during creation. Where the system default option was used, it will create a name starting with "Original Correlation".
- A Regression Line called "New performance line" which is identical to the default line, but available in edit mode so it can be modified as required.
When entering the screen in the context of Regression Lines and correlations that have previously been credited, modified and saved, then these will be displayed.
Creating New Regression Lines
To create a new Regression Line, click .
This opens the new Regression Line pop up.
Here the following details can be entered or updated:
- Name - a bespoke name for the Regression Line to facilitate each selection of the performance lines that have been created
- Regression Type - the type of regression line that should be used, either:
- Linear - the most commonly used type of predictive analysis through a linear approach to modelling the relationship between the dependent and independent variable datasets. This results in the creation of a 'straight' line relationship between the datasets.
- Polynomial- a non-linear approach to modelling the relationship between the dependent and independent variable datasets, where there is still a correlation between the data but a straight line does not quite fit the trend. This results in the creation of a 'curved' line relationship between the datasets.
- note where this type is used, the Gradient is set to "N/A"
- Show original correlation line - a tick box to indicate whether the original correlation line between the two datsets (i.e the read only correlation) should also be displayed on the graph
- Has upper control line - a tick box to indicate whether an upper control line should appear on the graph. This is an upper tolerance above the regression line that has been created between the two datasets.
- Upper limit - the upper limit tolerance that should be used - can be a fixed number of kWh, a set percentage or a number of standard deviations from the mean.
- Has lower control line - a tick box to indicate whether a lower control line should appear on the graph. This is a lower tolerance below the regression line that has been created between the two datasets.
- Lower limit - the lower limit tolerance that should be used - can be a fixed number of kWh, a set percentage or a number of standard deviations from the mean.
Editing Regression Lines
Right clicking on a Regression Line and selecting edit will show the same pop-up outline in the "Creating New Regression Lines" above and allow the same details to be updated for the existing regression line.
Removing Regression Lines
Right clicking on a Regression Line and selecting remove, will show the following confirmation pop-up.
Clicking "Yes" will remove the Regression Line.
Selected Points - Including/Excluding Data Points
It is possible to exclude specific data points from the regression correlation as you require either individually or in bulk in a variety of different ways. This is useful to exclude specific outliers that may significantly impact and skew the baseline performance.
Excluding points from the graph will recalculate the Regression Line.
This is managed via the Select Points component of the screen which can be seen on the left hand side. It can be achieved using the graph, using the time filter feature or updating the tabular data.
Using the Graph
The graph can be used by selecting individual points on the graph (holding down the CTRL key on the keyboard if wanting to select multiple) or highlighting multiple points on the graph by left clicking and dragging the mouse over the applicable data points. This will place a black border around the selected points and add each of them as a unique row in the Selected Points pane.
Clicking the "Exclude" button will then exclude the selected data points from the regression and changed them to a red colour which visually show which points are excluded.
Clicking the "Include" button will re-enable selected data points that have previously been excluded.
Clicking "Clear Selection" will de-select any points that are highlighted and listed in the Selected Points pane.
Using the Time Filter
The Time Filter feature can be used to quickly remove data points which relate to particular time periods. It is only available where an interval of one day, one hour or half hour are used.
The view available will be applicable to the interval that has been used when creating the regression. For example, if a daily interval is used, then the days of the week would be available for inclusion/exclusion. Alternatively, if one hour was used, then a grid of the hourly time bands in the context of each day would be included where each 'cell' could be included or excluded.
Using the Table Data
The list of data points can be viewed in tabular form in the "Table" tab, which is described below.
Table Tab
The screen defaults to show the data in a graph. This can be changed to show a list of all the data points on that graph in tabular form.
Click on tab at the top of the screen to display a list of the data points.
For each data point the table provides:
- Date (and time if applicable)
- X value - the value of the independent variable
- Y value - the value of the dependent variable
- Y deviation - the difference between the expected performance based on the regression line and the actual value (e.g the regression line expects the consumption to be 50 based on the production output being 10. The actual value is 75, so the Y deviations would be 25).
- A positive value indicates deviation above the regression line, a negative value indicates deviation below the regression line
- Included - a tickbox which shows whether or not the data point is included in the regression. These can be selected or deselected as required.
Cumulative Sum (CuSUM)
Once the Regression Correlation has been created and the expected performance determined, then the CuSUM control chart can be used to view the performance of the dependent variable over time. More details about this can be seen on the introduction page here.
This can be accessed by clicking on either "CUSUM" links at the top of the screen:
Chart Tab
The graph shows the cumulative data based on the CuSUM period.
This is the cumulative sum of the differences between the Actual minus the predicted value based on the regression, over time (e.g the regression expects the consumption to be 50 based on the production output being 10. The actual value is 75, so the difference would be 25). This data can be seen in the Table view, below.
Clicking and dragging your mouse over the graph will zoom into the data. Clicking the "Reset Zoom" button that appears will reset the graph to what it was.
This will highlight the step change in performance over time and highlight periods where there is significant performance degradation (i.e. line on the graph goes up), or improvement (i.e. line on the graph goes down). This trend over time may otherwise be hidden in the graph that was generated to create the Regression. This gives a much better view of the performance over time, so when looking at CuSUM chart, the changes in direction of the line indicate events that have relevance to the energy consumption pattern.
In the example above, reviewing the chart allows you to quickly see that there was a step change in performance which started from February 2019, as per the annotation below, which isn’t immediately obvious in the general correlation.The line sharply goes down after a prior consistent rising trend. This might trigger investigative action as to what changed at the start February 2019 and lead to corrective action being taken to optimise the performance. Subsequently, the same method would be used to re-assess the performance over the updated period of time and validate the performance has improved and the building is operating at an optimally. If corrective action was taken, and a a new Regression line was created to also include the new period of time, then you would expect to see the line to start going back "up" after the change shad been made to bring performance back on track.
This technique can then be used on an ongoing basis in the continuous improvements lifecycle to help identify and react to resolve issues effectively.
Overview
- The Overview chart shows the dependent data and independent data for the CuSUM period.
- Dragging your mouse over this chart will update the CuSUM period displayed, and automatically update the CuSUM Period dates.
- If the CUSUM period is different to the Regression Period then you will see two colours highlighted on the overview. Blue is the Regression Period and purple the CUSUM Period.
- The Overview Graph can be minimised by clicking the Overview header bar
Details
This section shows a number of details relating to the CuSUM that has been established.
- CuSUM - The is the sum of the cumulative differences between the applicable dates in the CuSUM Period
- CuSUM CO2 - The is the sum of the cumulative differences of CO2 emissions between the applicable dates in the CuSUM Period
CuSUM and CuSUM CO2 will always be 0.00 when moving straight from Regression without changing the CuSUM period. When the CuSUM period is changed then anything above the 0 value based on the regression period will be calculated and listed in both kWh's and CO2 emissions.
Data Analysis
This section allows the creation of visual overlays which can be used to determine whether the performance is out of control shown in the CuSUM graph. A V-Mask is an overlay shape in the form of a V on its side that is superimposed on the graph of the cumulative sums. The origin point of the V-Mask (see diagram below) is placed on top of the latest cumulative sum point and past points are examined to see if any fall above or below the sides of the V
Tick the Show V-Mask box to add a V-Mask target onto the CuSUM.
To confirm target creation select "Yes" in the resulting popup window
By default a truncated V-Mask will be applied, adding a green layer to the graph.
The CuSUM points that now fall above the top or bottom arms of the V-Mask will be highlighted in red and shown on the graph as "exceptions".
There are 3 types of V-Mask targets available as follows
- Full mask
- Snub-nosed mask
- Truncated mask
The type of V-Mask that is used can be changed after a V-Mask has been created, this is explained in the "Targets" section directly below.
Tick Show Fixed Targets to add a CuSUM fixed target. To confirm target creation select "Yes" in the resulting popup window.
CuSUM Period
Use the start date and end date to choose the start period and end period for the CuSUM, either selecting the drop down boxes or the calendar icon .
Selecting "Refresh" This will restrict the data points on the graph to only those that fall within the two dates.
Targets
This section displays any targets that have been created based on the CuSUM graph.
- To change any of the settings, right-click on the target and select Edit.
- To remove a target, right-click on the target and select Remove
If you edit a CuSUM V-Mask Target you will be presented with the following window. After changing any configuration, click "Recalculate" and the chart will be updated.
If you edit a CuSUM Fixed Target you will be presented with the following window. The target value can be entered as required and configured to end on a specific date. After changing any configuration, click "OK" and the chart will be updated.
Update the settings as required and click OK
The Targets will only create an event if the first point of data is outside the tolerance range by >5% or if there are two consecutive points of day outside the tolerance range but with the 5% range.
Table Tab
Selecting the tab to show the data points that are plotted on the graph
The date displayed within the table is the end date for that period
The table and charts can be saved by click
Control Chart
It is not possible to go directly from Regression to the Control Chart as it is a designed as a stepped approach.
Navigate from Regression to the Control Chart via CuSUM by clicking then
Graph
The graph will display data for the same period that has been set in the CuSUM
The graph displays
Control Lines - accepted areas of data based on the control limit
Exceptions - areas which fall outside of the control limit
Difference line - data for the period being presented
Overview
The Overview Graph highlights the current target period by default and displays historic and future data
The Overview Graph can be minimised by clicking the Overview header bar
Targets
By default a New Control Target will be created for the user to amend as required. This target is not saved unless you click the Save button
The Targets section details the current control line parameters as well as allowing you to add and remove targets.
Control Lines
To amend the control lines applied change the control limit
Alternatively you can manually drag the control lines on the graph, to do this select and click and drag on the graph to set desired control limits
To remove the Control Limits from the graph uncheck .
The Targets will only create an event if the first point of data is outside the tolerance range by >5% or if there are two consecutive points of day outside the tolerance range but with the 5% range.
@ Copyright TEAM - Energy Auditing Agency Limited Registered Number 1916768 Website: www.teamenergy.com Telephone: +44 (0)1908 690018