Earlier this morning, @jonas2 mentioned that he was planning on using Reactor to detect spikes in humidity data in his bathroom, and turn on the fan automatically. By detecting spikes, he doesn’t need to seasonally adjust a threshold; that is, because ambient humidity changes over the course of the year, a fixed threshold can affect the sensitivity of detection and likely require seasonal adjustment. For example, if your indoor humidity averages 33% in winter, as it does where I live, and 50% in summer, it’s hard to find a single threshold value that works all year–setting your threshold to 55% may be too high for winter, and cause nuisance fan runs on rainy summer days–so you end up having to tweak your threshold seasonally, and having to *remember* to do it. By detecting spikes, the system can ignore the gradual seasonal changes of humidity, and be more likely to only respond to sudden changes as seen when someone is taking a hot shower.

So what @jonas2 needs is to implement a *time series*–another common HA task. By using a time series, it’s easy to find spikes in the data, and take action.

Let’s create a new ReactorSensor called “Bathroom Data Collector” and add some variable/expressions to it, to create a super-basic starting point:

- series =
`push( series, tonumber( getstate( "Bathroom Humidity Sensor", "urn:micasaverde-com:serviceId:HumiditySensor1", "CurrentLevel" ) ), 5 )`

- range =
`max(series) - min(series)`

And to our Reactor conditions, we’ll add only an *Interval* condition, with an interval of 1 minute for this example. That’s short, but just for this example, if you’re following along and creating this RS as you read, this short interval will make it easier to see changes happening when you watch it work; if I were doing this in a real-world implementation, I would likely use a 5- or 10-minute interval for this particular application.

Let’s look at the `series`

expression first, since it’s the real work. Working from the inside of the expression out, the `getstate()`

function retrieves the value of a state variable on a device, in this case the current humidity level of our humidity sensor. The idea here is that our interval condition will fire every minute, and when it fires, the expressions are re-evaluated, so this `getstate()`

will grab the then-current value of the humidity sensor. The `getstate()`

call is wrapped in a `tonumber()`

to convert it to a true numeric value (it comes back from `getstate()`

as a string), and then that is wrapped with an `push()`

, and this adds the fetched, converted value to an array–the first argument is the array (which is created if it doesn’t exist), the second argument is the value to be added to the array, and the third argument is the maximum allowed length of the array. I’ve set the maximum array length at 5, so that we keep only the five most recent results–additional pushes cause the oldest value to fall off the array, so it will never be longer than five elements. Since our interval is one minute, that means our array will have five minutes worth of samples. The function result of `push()`

is the new, updated array, which is stored in `series`

(so it’s self-updating).

Since we now have five minutes worth of humidity samples to look at, we can pretty easily find a spike in that data by finding the extrema of the array elements (the minimum and maximum), and computing the difference between them, which is what the `range`

expression does. It uses the expression parser’s built-in `min()`

and `max()`

functions, passing in the arrays for it to scan to make quick work of the job.

Now, to determine if we should turn on the fan, it’s a simple matter of examining `range`

to see if it is greater than some threshold. For example, if wanted to trigger our fan when the change in humidity is 4% or more in our five-minute period, we would simply check to see if `range >= 4`

(do this in another RS–see further comment below).

But there are two minor problems with this simple implementation. First, Reactor’s `getstate()`

function forces an immediate re-evaluation of the sensor configuration whenever the subject device state variable changes, and this causes an undesirable side-effect: whenever the humidity sensor updates itself, its state variable changes, and this causes the RS to re-evaluate the conditions and store an additional element in the array between our desired intervals, which may skew our data. More importantly, though, is that our Interval condition has two states–*true* and *false*–and each transition between them causes our expressions to be re-evaluated–so they will be evaluated *twice* per interval. That means the configuration we have created will actually store *two* humidity samples per minute (one when the condition goes *false* to *true* and one when the condition goes *true* to *false*), and that will certainly goof up our data.

We can easily fix this by having the `series`

expression only update the array when the Interval condition is *true*. This will filter out both the unpredictable updates from the humidity sensor itself, and the updates caused by the “falling edge” reset (*true* to *false* transition) of the interval condition itself. Here’s how we change our expressions to do that:

- isint =
`getstate( "Bathroom Data Collector", "urn:toggledbits-com:serviceId:ReactorGroup", "GroupStatus_root" )`

- series =
`if( isint=="1", push( series, tonumber( getstate( "Bathroom Humidity Sensor", "urn:micasaverde-com:serviceId:HumiditySensor1", "CurrentLevel" ), 5 ) ), series )`

- range =
`max(series)-min(series)`

The new `isint`

variable is “self-reflective”: it’s looking at its own ReactorSensor to get the status of the root group, which contains (only) our Interval condition. So when `isint`

goes to “1”, the Interval condition has gone *true* and the re-evaluation is caused by that event, and not any external update/trigger or the interval’s reset.

So then all we need to do is wrap our `series`

expression in an `if()`

function. The `if()`

function takes three arguments: a conditional expression (which evaluates to *true* or *false*), an expression to be evaluated if the conditional is *true*, and an expression to be evaluated if the conditional expression is *false*. We make our `push()`

expression the *true* condition expression, and we supply `series`

as the *false* expression. That last bit is important, because all expressions return values, so if it’s not time to update our `series`

array, we must return `series`

as it currently is, thus setting it back to itself. So basically, with the `if()`

in place, if we’re updating on the interval trigger (`isint`

==“1”`), we now return the array with a new value pushed, and if not, we return the array untouched.

That takes care of the side-effects of re-evaluations that can happen off the interval’s schedule, and we should be good to go.

One more important implementation note:because of the evaluations performed and the importance of their timing, I would do the series data collection shown here in a separate ReactorSensor from any logic. This RS is only a data collector. Let the decisions be made elsewhere. This will help prevent excessive (and potentially unnecessary) evaluations.

Now… tuning. There are three factors in tuning this time-series generator:

- The interval at which samples are collected (sampling interval);
- The total length (in time) of the series;
- The threshold for reaction.

The sampling interval is, predictably, controlled by the Interval condition. If you want samples collected every 15 minutes, then you would set the Interval condition for 15 minutes; if you wanted samples hourly, then you would set it for one hour.

The total length in time of the series is controlled by two factors: the sampling interval, and the maximum length of the `series`

array, which is controlled by the third argument to `push()`

. If our sampling interval is 15 minutes, and our maximum array length is 4, then our total time is 4 * 15 = 60 minutes. If our sampling interval is two hours, and our array length is 12, then the total time is 2 * 12 = 24 hours.

The last tuning element is the threshold, and this is something you will just have to “feel out” based on the parameters of your room and devices, and your sensibilities. It may take a little experimentation to figure out the right value. In this case, we might watch `range`

while running the shower a bit and seeing how it changes. Or, you could add another expression with `push()`

that stores the last 100 values of `range`

for you to review.

We can also get as complicated as we want with the math on this data. Let’s add a couple of expressions, in the order shown:

- lastMA =
`if( isint=="1", MA, lastMA )`

- MA =
`sum(track) / count(track)`

- deltaMA =
`abs( MA - lastMA )`

Here, `MA`

is the computed average of the series data–a simple moving average. The expression before it, `lastMA`

, will, on the interval as we did with `series`

, store the current value of `MA`

. Because `lastMA`

is evaluated before `MA`

, `lastMA`

will store the previously computed value of `MA`

before `MA`

then changes. Finally, `deltaMA`

gives us the difference between the two. For some series, you may need to “boil out” small spikes in the data so you can focus on the really big ones you care about, and using a moving average such as this can help you do that. But the idea here is to show that once you have the data in the `series`

array, you can do other computations with it, not just the min/max shown in the original examples.

EDIT 2019-08-24: Several changes made above, in the text directly rather than as deltas here, for ease of reading/continuity. Most importantly, remember that `getstate()`

always returns a *string*, even when the value being retrieved is (conceptually) a number–if you want to use it as a true number, you have to wrap your `getstate()`

call with a `tonumber()`

call.

EDIT 2020-12-28: The `arraypush()`

function is now deprecated, so these instructions have been amended to show the new `push()`

function replacement.