DataYours: a prototype next-generation dataMine

[b]
This is all very out of date and only an initial proof of concept. You shouldn’t be using the code on this thread, but the one from the beta test thread here: [url=http://forum.micasaverde.com/index.php/topic,24669.msg171205.html#msg171205]http://forum.micasaverde.com/index.php/topic,24669.msg171205.html#msg171205[/url][/b]

Here’s a prototype of some of the major components of a next-generation dataMine, along the lines of the design criteria I outlined here: [url=http://forum.micasaverde.com/index.php/topic,17232.msg154658.html#msg154658]http://forum.micasaverde.com/index.php/topic,17232.msg154658.html#msg154658[/url]

When installed and configured, this demo should:

[ul][li]watch a few key variables: temperature, security, power, and battery levels[/li]
[li]store them in a Whisper database (yes, on Vera itself, but the files are very small)[/li]
[li]enable plotting of these data from this database and from a dataMine one (if present)[/li][/ul]

To install, unzip and load the files in the usual way, then run this script in the Test Luup code (Lua) window:

------
--
-- DataYours - a prototype next-generation datamine
-- version = 2014.02.08  @akbooer
--
--

-- load and start the three component daemons

require "L_DataWatcher"
require "L_DataCache"
require "L_DataGraph"

local IP   = "127.0.0.1"            -- this machine
local PORT = "1392"                 -- unassigned port
local DATAMINE = "/dataMine/"       -- dataMine database root

--local SYSLOG = "xxx.xx.xx.xxx:yyy"  -- set if you want to log to syslog rather than Vera's log file

local function DWconfig (x) luup.inet.wget ("http://"..IP..":3480/data_request?id=lr_DataWatcher&"..x, 1) end
local function DCconfig (x) luup.inet.wget ("http://"..IP..":3480/data_request?id=lr_DataCache&"..x,   1) end
local function DGconfig (x) luup.inet.wget ("http://"..IP..":3480/data_request?id=lr_DataGraph&"..x,   1) end

-- create a directory for the Whisper database

os.execute "mkdir /whisper/"

-- configure DataWatcher to watch all Termperature, Security, Power and Batteries and send to specific port

DWconfig ("send="..IP..":"..PORT)
if SYSLOG then DWconfig ("syslog="..SYSLOG) end

DWconfig "watch=*.urn:upnp-org:serviceId:TemperatureSensor1.CurrentTemperature"
DWconfig "watch=*.urn:micasaverde-com:serviceId:SecuritySensor1.Tripped"
DWconfig "watch=*.urn:micasaverde-com:serviceId:EnergyMetering1.Watts"
DWconfig "watch=*.urn:micasaverde-com:serviceId:HaDevice1.BatteryLevel"

-- configure DataCache to listen and store the results

DCconfig ("listen="..PORT)
if SYSLOG then DCconfig ("syslog="..SYSLOG) end

-- configure DataGraph to read Whisper and dataMine databases

if SYSLOG then DGconfig ("syslog="..SYSLOG) end

DGconfig ("datamine="..DATAMINE)

--
-- Congratulations! You should now be logging specified device data
--
-- to plot, visit, for example, the URL 
-- <yourVeraIP>:3480/data_request?id=lr_render&target=Vera-12345678.321.urn:upnp-org:serviceId:TemperatureSensor1.CurrentTemperature
--

The configuration may take about 10 seconds to run. The script should be easy to understand and modify (for example, to add new variables.) Again, this is only a prototype, so in future this will all be hidden behind a GUI. Once configured, the settings are persistent, so you don’t have to run it again. In order to start the prototype after every reload, simply include the three [tt]‘require’[/tt] lines in Startup Lua.

To test, you’ll have to wait a while for some data to become available. The database is located in [tt]/whisper/[/tt] with one file per variable, so you should be able to see the total size after running the script. The individual files never get any bigger (because this is a ‘round-robin’ database). They are all limited to just one week duration, except for battery levels which are one year (one point a day). To plot, you can just browse URLs like this:

<yourVeraIP>:3480/data_request?id=lr_render&target=Vera-12345678.321.urn:upnp-org:serviceId:TemperatureSensor1.CurrentTemperature

The target names are of the form [tt]...[/tt]. The exact naming and numbering can be seen from the [tt]/whisper/[/tt] directory, EXCEPT that in the serviceId colons ‘:’ have been replaced by carets ‘^’ because of filename restrictions on some operating systems.

I’m very keen to get feedback on the operational aspects of this prototype which I have had running on my own systems for over a week. I’ll follow up with more details in posts on each one of the components.

[b]
This is all very out of date and only an initial proof of concept. You shouldn’t be using the code on this thread, but the one from the beta test thread here: [url=http://forum.micasaverde.com/index.php/topic,24669.msg171205.html#msg171205]http://forum.micasaverde.com/index.php/topic,24669.msg171205.html#msg171205[/url][/b]

This prototype design is based heavily on the Graphite system (itself written in Python). There is some great documentation for this here [url=https://graphite.readthedocs.org/en/latest/overview.html]https://graphite.readthedocs.org/en/latest/overview.html[/url], and much of the high-level information is valid here. The database itself is a round-robin style, called Whisper [url=https://graphite.readthedocs.org/en/latest/whisper.html]https://graphite.readthedocs.org/en/latest/whisper.html[/url] with fixed maximum duration, but possibly varying resolution as data ages (see the docs!)

All I have done is to translate the Whisper code, originally written in Python, to Lua. This is only partially done at present, with the basic create / update / fetch routines, but none of the more advanced features required for an industrial-strength system (which Graphite certainly is.) The database code is pure Lua and will run anywhere, but it is not binary-compatible with Graphite Whisper files because I have chosen CSV rather than binary packing. This makes them exactly three times larger than real Whisper ones, but space (outside of Vera) is not a problem.

The Graphite daemons, which provide all of the original system’s functionality have not been ported because they have very heavy external dependencies and a complex install process - quite unsuited to the Vera environment. What I have done here is to reverse-engineer their basic functionality whilst endeavoring to maintain their external interfaces. It should be quite possible, therefore, to interface this prototype with a real Graphite system and use all its plotting utilities (and third-party ones) without having to re-invent the wheel. Not everyone will want to do this, however, so I envisage some further GUI work to integrate better with Vera.

The whole thing is very modular. There are three major components (daemons) at present:

[ol][li][tt]DataWatcher[/tt] - simply watches for variable changes and sends them on[/li]
[li][tt]DataCache[/tt] - an implementation of Graphite’s [tt]carbon-cache[/tt] which receives and stores data in Whisper[/li]
[li][tt]DataGraph[/tt] - an implementation of Graphite’s [tt]Web App[/tt] which plots results from the database[/li][/ol]

All data passes between components in UDP datagrams in the standard Whisper plaintext format, so in fact all the components can be running on different machines (not just a Vera). I have three machines with three DataWatchers each feeding two DataCache instances on different machines and DataGraph plotting from another machine accessing the data stored on a NAS. As you can see from the script, all the daemons also listen for HTTP configuration requests to set up key parameters like IP addresses and ports to use. Both data and diagnostic messages can be written to an external syslog server instead of Vera’s log file. If you just wanted to send data to syslog, then the only component you need is DataWatcher.

I’ll follow up with details on each of the above daemons.

All of the components run as autonomous (and asynchronous) daemons - not taking their own additional stack space or presenting any device features. They are small and, hopefully, efficient. A single framework module [tt]DataDaemon[/tt] provides all its clients (DataWatcher, DataCache, etc.) with some basic features:

[ol][li]HTML URL command line interface for configuration[/li]
[li]persistence of configuration parameters across Luup restarts and Vera reboots[/li]
[li]individual configuration commands and files for each client[/li]
[li]a listener callback for incoming UDP datagrams (containing pathname, value, and timestamps)[/li]
[li]a UDP sender call for data to be sent to one or two remote listeners (so it can duplicate incoming data)[/li]
[li]optional logging of data and debug information to a remote syslog server[/li][/ol]

The URL configuration interface supports a number of commands to support this functionality. All configuration commands are sent to the daemon as URL GET requests of the following form:

http://<yourVeraIP>:3480/data_request?id=lr_<Name>&<a=b>

where [tt]Name[/tt] is the name of the client and [tt]<a=b>[/tt] is one of:

[ul][li][tt]send=xxx.xx.xx.xxx:yyy[/tt] - configures UDP IP and PORT for variable changes to be sent (Whisper plaintext format)[/li]
[li][tt]send2=xxx.xx.xx.xxx:yyy[/tt] - configures UDP IP and PORT replication of the above[/li]
[li][tt]listen=port[/tt] - configures a UDP port number on which to listen for incoming data[/li]
[li][tt]syslog=xxx.xx.xx.xxx:yyy[/tt] - configures UDP IP and PORT to echo above changes to a syslog (plus some status messages)[/li]
[li][tt]systag=tagname[/tt] - specifies tagname to identify syslog output (default is the client name)[/li]
[li][tt]show=config[/tt] - lists current internal configuration[/li]
[li][tt][tt]show=client[/tt][/tt] - lists current client-specific configuration[/li]
[li][tt]anythingElse=something[/tt] - passed to the client to define its specific configuration parameters[/li][/ul]

These are the commands that the configuration script in the [tt]DataYours[/tt] demo uses to join up the various components.

Both the generic daemon configuration and the client-specific configuration parameters are kept in a client-specific file which is run as a script at startup. Currently these files are named like [tt]DataWatcher.config[/tt] etc., and stored in [tt]/www/[/tt], so are available to any web browser from the normal Vera IP address.

This basic framework provides all the I/O and persistent context that the various client modules need. Although it sounds heavy, it isn’t, taking only about 200 lines of Lua/Luup code. Next I’ll document the specifics of each of the three (at the moment) clients of this basic module.

You never stop to amaze me akbooer!

[tt]DataWatcher[/tt] has no counterpart in the Graphite system on which this prototype is based, because it is simply the data collection front-end. A couple of URL command requests allow configuration of which variables to watch.

Device variables are uniquely specified through the syntax [tt]..[/tt] on the command line:

http://<yourVeraIP>:3480/data_request?id=lr_DataWatcher&watch=321.urn:upnp-org:serviceId:TemperatureSensor1.CurrentTemperature

which starts watching that variable.

Wildcard ‘*’ device numbers may be used to specify all devices with that serviceId and variable name:

http://<yourVeraIP>:3480/data_request?id=lr_DataWatcher&watch=*.urn:upnp-org:serviceId:TemperatureSensor1.CurrentTemperature

A second command [tt]nowatch[/tt] stops logging of a specific variable:

http://<yourVeraIP>:3480/data_request?id=lr_DataWatcher&nowatch=321.urn:upnp-org:serviceId:TemperatureSensor1.CurrentTemperature

This is useful to deselect unwanted devices picked up when using wildcard device numbers, particularly if being used interactively.

Using the generic [tt]send, send2[/tt] and [tt]syslog[/tt] commands to configure those UDP addresses will start [tt]DataWatcher[/tt] sending Whisper plaintext format datagrams (see the Graphite documentation) to those destinations for each incoming watched variable change. The UDP format is very lightweight, uses minimal network and cpu resources, but doesn’t guarantee receipt of the message since there is no handshake. However, it’s very widely used for non-critical data and it as reliable as the underlying network (which, on an internal LAN in particular, should be very reliable.)

The [tt]DataWatcher[/tt] daemon is really simple, taking less than 100 lines of Lua code to add its specific functionality to the generic daemon code. If all you wanted to do was send variable changes to syslog or some other UDP server then [tt]DataWatcher[/tt] is all you need.

[tt]DataCache[/tt] is a look-alike for Graphite’s [tt]carbon-cache[/tt] daemon. It simply listens for incoming data and stores it in the Whisper database.

Actually, it’s not quite that simple, because there could be some complex configuration decision to be specified (see the Graphite Carbon documentation to believe it: [url=http://graphite.readthedocs.org/en/latest/config-carbon.html]http://graphite.readthedocs.org/en/latest/config-carbon.html[/url])

The incoming datagram specifies all the metadata that is needed to know where to store it: the [tt]..[/tt] syntax of the data source is simply mapped to a filename. (I’ve chosen not to replicate the directory tree which the original [tt]carbon-cache[/tt] uses. It’s designed for tens of thousands of variables, but we’ll only be storing a few hundred at maximum, so [tt]DataCache[/tt] stores all the Whisper files in a single directory, making administration and backup really very easy.)

But what about new data? Whisper files have different time resolutions and data retentions, possibly containing a number of archives with progressively longer retention times and lower resolutions (see the docs: [url=http://graphite.readthedocs.org/en/latest/whisper.html]http://graphite.readthedocs.org/en/latest/whisper.html[/url]) and aggregation functions to apply when down-sampling. These have to be specified initially for a new file to be created. In [tt]carbon-cache[/tt] there are a number of different configuration rule sets which define these which are not yet implemented in [tt]DataCache[/tt].

For the [tt]DataYours[/tt] demo I have simply hard-wired defaults for different data types:

[ul][li]Power, Temperature, Humidity, Light, Generic sensors: every 30 minutes for a week, taking the last measurement in each interval[/li]
[li]Alarm, Security, DoorLocks: three archives per file with each interval being summed to aggregate to the next lower precision:
[list]
[li]one second sampling for a minute [/li]
[li]one minute sampling for an hour[/li]
[li]one hour sampling for a week[/li]
[/list]
[/li]
[li]Batteries: once a day for one year[/li]
[li]Anything else: hourly for one week[/li][/ul]

Most of these files, then, are time-limited to one week and take up typically 12K of disk space. The more complex multi-archive structure I’ve chosen for security sensors justifies a whole post in itself to really go into, but does serve to demonstrate the sophistication possible with Whisper file archives. The round-robin nature of the archives and their multi-resolution capability bring some fantastic possibilities.

It’s really important to appreciate that you can’t simply scale file size requirements linearly if you are using this multi-archive feature, because older data are stored with lower time precision. As an example, on my system I am storing power usage in a file which has:

[ul][li]20 minute, for 30 days[/li]
[li]3 hourly, for 1 year[/li]
[li]once a day for 10 years[/li][/ul]

Stored at full 20 minute sampling, this would be 324365*10 = 262,800 points, at 36 bytes a sample, that’s ~10 Mbytes (OK, not too big), but in this multi-scale archive it only takes about 300Kb - that’s 30 times less! You have to think carefully about how you will use the data in future before configuring an archive. At the moment [tt]DataCache[/tt] has no specific configuration parameters, and is only about 150 lines of code - this will change! All you need to do is set the UDP port on which to listen to incoming datagrams.

[tt]DataGraph[/tt] is the [tt]Graphite webapp[/tt] look-alike ([url=http://graphite.readthedocs.org/en/latest/render_api.html]http://graphite.readthedocs.org/en/latest/render_api.html[/url]). It reads data from the Whisper database and plots or prints it out in a variety of formats.

http://<yourVeraIP>:3480/data_request?id=lr_render&target=Vera-12345678.321.urn:upnp-org:serviceId:TemperatureSensor1.CurrentTemperature

This is currently only a partial implementation, so following along with the Graphite render API documentation link, the exceptions are here below:

[ul][li]target - only one path (variable) allowed at the moment[/li]
[li]wildcards - sorry, not yet[/li]
[li]from/until - relative time (the most useful) works as described. For absolute time formats, Graphite did not, regretfully, use the ISO 8601 standard, which is what is currently impemented in [tt]DataGraph[/tt]. However, the good news is that the YYYYMMDD format is the same in both, so stick to that.[/li]
[li]format - supports [tt]svg[/tt] (the default), [tt]csv[/tt], and [tt]json[/tt], in exactly the formats described, except that the [tt]svg[/tt] option does not embed the metadata variable (makes no difference if you’re just looking at the plot.) [tt]raw[/tt], [tt]png[/tt], and [tt]pickle[/tt] are not supported.[/li]
[li]graph parameters - not (yet) supported[/li][/ul]

In addition, this implementation also reads from a dataMine database, if configured to do so.

By the Graphite author’s own admission (Chris Davis, an excellent article on its architecture and development here [url=http://www.aosabook.org/en/graphite.html]http://www.aosabook.org/en/graphite.html[/url]) “Graphite’s URL API is currently a sub-par interface in my opinion. Options and functions have been tacked on over time, sometimes forming small islands of consistency, but overall lacking a global sense of consistency.” So although it might be a bit ugly, we’re going to have to stick to it if we want URL-level compatability with third-party Graphite tools.

Aside from the generic daemon option to set syslog and the syslog tag, the only thing that needs to be done is to define the location of the dataMine database, if you want to plot data from both that and Whisper. (Any variable to be plotted has to be currently being logged in Whisper.) To do that, you need the command:

http://<yourVeraIP>:3480/data_request?id=lr_DataGraph&datamine=/dataMine/

All plots and data from [tt]DataGraph[/tt] are available even when accessing through the MiOS servers (using the appropriate authorization syntax.)

The DataDash daemon adds a prototype user interface to the mix. Here’s a snapshot of the simple dashboard that it provides. This shows all the variables being logged into Whisper on one screen, grouped by Vera and then by serviceId (and colour-coded) along with their device number. Mouse over on one of the rectangles and a tool-tip menu pops up with the name of the variable and options to plot it over the last day / week / month / year.

The dashboard is displayed with this URL command:

<yourVeraIP>:3480/data_request?id=lr_dashboard&width=650&height=450

If you click on any element, you move lower in the hierarchy… right-click to return to a higher level.

I am subscribing to this thread. Very promising!

[edit]
wow, just wow!

I like how you keep history to a low byte count by aggregation! Just like the conversion from Raw to jpg in photography has to be done with care before deleting the original, configuring DataYours will need some careful thinking and planning.

I wanted to use dataMine to see the relationship between outside temperature (Montreal, Canada region) and total energy consumption in order to see how changing heating habits would affect electricity bill. The graphs alone are pretty much useless, and dumping the raw data for external analysis isn’t easy (although a simple mod to dmDBServer would help), so I like what you are building!

At last! some feedback. Thanks.

Absolutely right. But you do have the tools with which to do it (ie. configuring the right archive retentions.)

I wanted to use dataMine to see the relationship between outside temperature (Montreal, Canada region) and total energy consumption in order to see how changing heating habits would affect electricity bill. The graphs alone are pretty much useless, and dumping the raw data for external analysis isn't easy (although a simple mod to dmDBServer would help), so I like what you are building!
Very interesting. I am doing the same for my ground source heatpump. I use the metric 'Heating Degree 'Days' and get a 0.98 correlation coefficient between that and my heat pump energy use - so the controls are excellent.

What dmDBserver mod were you after?

@akbooer
Great work so far. I will read up on Graphite.

@akbooer

I’ve been trying to free up some memory on my Veralite to test this, but based on what you’ve shared it looks gooooood…

I was thinking about implementing a simple report that dumps the data from a device in a table form, with an option for CSV.

I think this sort of functionality should be built-in the Vera as it adds a lot to the awesomeness factor.

Out of curiosity, what IDE or development environment are you using for your work, and how do you typically debug? I don’t want to derail the subject, but being from a Visual Studio/C background with experience on .NET and having played with eclipse/java, it would be nice for me to get my hands dirty on this sort of things. However, I am lazy and would appreciate if an IDE could help me type method/member names for known classes/object structures.

I was thinking about implementing a simple report that dumps the data from a device in a table form, with an option for CSV.[/quote]

Should be there already… do you see ‘format’ in the help text given by this request (I may be running a newer version than you):

<yourVeraIP>:3480/data_request?id=lr_dmDB

You ought to be able to add the option [tt]&format=csv[/tt] to any search key request to return the CSV. A request like:

<yourVeraIP>:3480/data_request?id=lr_dmDB&name=Temperature&device=4&format=csv

gives output like:

1359906435,8
1359908709,9
1359944384,8
1359947984,9
1359951584,10
1359955184,9.9
1359956984,10.1
1359958784,10

The [tt]DataYours[/tt] prototype does the same thing (for Whisper data), although in a ‘Graphite standard’ csv format:

...:3480/data_request?id=lr_render&target=Vera-35104571.131.urn:micasaverde-com:serviceId:EnergyMetering1.KWH&from=-d&format=csv

gives:

entries,2014-02-12 22:00:00,68638.192
entries,2014-02-12 22:20:00,68639.304
entries,2014-02-12 22:40:00,68640.484
entries,2014-02-12 23:00:00,68641.614
entries,2014-02-12 23:20:00,68642.668
entries,2014-02-12 23:40:00,68644.7
entries,2014-02-13 00:00:00,68645.682
entries,2014-02-13 00:20:00,68646.732

Ok, I digged a bit further in the code and found it in the core handler rather than on a predefined report.

more awesomeness!

…answered long ago in this post! [url=http://forum.micasaverde.com/index.php/topic,13597.msg101676.html#msg101676]http://forum.micasaverde.com/index.php/topic,13597.msg101676.html#msg101676[/url]

…answered long ago in this post! [url=http://forum.micasaverde.com/index.php/topic,13597.msg101676.html#msg101676]http://forum.micasaverde.com/index.php/topic,13597.msg101676.html#msg101676[/url][/quote]
Sorry for the cross post… I just started having fun with my setup, after installing over 20 Zwave devices in the house, adding basic scenes and playing with URL based NFC tags, etc I’ll read that other thread.

You’re surely too busy with Arduino ? :wink:

I would like to create a flexible energy consumption report based on accumulated data per day/week/month/year and device/room/whole house in combination with category (lighting, heating, appliance etc.). Is this something I can already or will be able to do with DataYours or with the help of?

It would be really nice to be able to embed these kind of reports into a customized dashboard/interface. I realize I might be on my own here though… :slight_smile: