DataYours: a prototype next-generation dataMine

S-F · April 6, 2014, 7:20pm

@ akbooer,

I have a request. As development of this proceeds would it be possible to log to a network share and have the size of the DB be larger? I have tons of space on my file server that I’d love to fill with this data. I’d like the resolution of the data to not decrease ever and I’d like to keep as much free space on my Vera as possible.

akbooer · April 6, 2014, 7:26pm

Another month of rather slow progress on the DataYours GUI…

a few more snapshots below showing how things are coming along:

[ul][li]configuration - one of the challenges has been how to arrange for configuration of a number of ‘daemons’ possibly scattered across several processors. The latest version of each of the acquisition / storage / display daemons now have their own mini web pages which are brought together by DataYours into one single panel - really only used to set things up initially.[/li]
[li]device selection - again a challenge spread over multiple (unbridged) processors. I’m a big fan of the ‘TreeMap’ style of displaying many multi-dimensional parameters. This plot enables simultaneous viewing of every device (in this case, 121 of them) across three Veras. The ability to view the name and value of any variable in any of those devices is just one click away. Configuring multiple archives for any variable is also done from here.[/li]
[li]graphics and multiple databases - a TreeMap is also used to view all variables stored in the ‘Whisper’ database (63 on show here) and, again, the ability to plot any one of those is just one click away. The capability also extends to plotting variables and saved graphs from the dataMine database.[/li][/ul]

Testing by a third-party would be appreciated, but as this is still work in progress it may need a bit of iteration to get things running smoothly - I’ve run out of machines to try a clean install on!

akbooer · April 6, 2014, 7:31pm

Ha! You posted whilst I was writing my latest.

One of my Whisper databases (I have two set up doing ‘data sharding’) is already on an external NAS (an Apple Time Capsule.) I currently have 63 variables being logged with a maximum duration of 10 years (although this has only been running for a couple of months so far!) There’s no requirement whatever to configure multi-resolution archives, the only constraint being (because this is a round-robin database) that there is SOME upper limit to the duration.

[Edit: note that a single (double precision) datapoint with timestamp takes 36bytes. 5 minute samples for 10 years is about 1 million points, 36 Mbyte with no aggregation. 30 such channels, then, ~1Gbyte. Not a problem.]

jlebaugh · April 26, 2014, 5:25pm

This is great! Good work! ;D

I am using a similar setup to AgileHumor, but instead of logging to a DB, I am feeding the data into splunk. I only have a couple of hours of data so far, but I’m excited about the dashboards I’ll be able to create.

I will post back with some progress. Thanks again!

akbooer · April 28, 2014, 2:07pm

That’s good to hear - it was my hope in that making it modular, with open data formats, that it could be easily fed into other analysis tools. I hadn’t heard of splunk, but having now looked, it looks intriguing.

I will post back with some progress. Thanks again!

Please do!

PS: You see that some DataYours discussion is now taking place on the beta test thread: [url=http://forum.micasaverde.com/index.php/topic,24669.0.html]http://forum.micasaverde.com/index.php/topic,24669.0.html[/url]

TC1 · April 28, 2014, 4:31pm

re: Splunk, is an enterprise logging analysis tool. It’s real beauty is that it can consume any log or semi-structured file and somehow pull data out of it, or you can help it with regex coding.

The downside is that it starts in the 6-figure range (we’ve used it where I work), but the free license allows you to consume up to 500MB a day, which is perfect for a homeowners. At work we produce hundreds of GBs a day of log data across all our systems.

Now that I think of it, Splunk would we be perfect for HA analysis. There’s a similar product called SumoLogic

-TC

jlebaugh · April 28, 2014, 5:29pm

[quote=“TC1, post:66, topic:179386”]re: Splunk, is an enterprise logging analysis tool. It’s real beauty is that it can consume any log or semi-structured file and somehow pull data out of it, or you can help it with regex coding.

The downside is that it starts in the 6-figure range (we’ve used it where I work), but the free license allows you to consume up to 500MB a day, which is perfect for a homeowners. At work we produce hundreds of GBs a day of log data across all our systems.

Now that I think of it, Splunk would we be perfect for HA analysis. There’s a similar product called SumoLogic

-TC[/quote]

Splunk is pretty great for all sorts of stuff. It is very expensive (but worth it IMO) for enterprise use. But the free license is perfect for home use. You can apply for a developer license that allows for a bit more indexing, and unlocks some of the enterprise features as well, but it is only good for 6 months or so, then you have to re-apply.

sabolcik · April 29, 2014, 5:22pm

Looks very cool. I’m going to wade my way through dataMine first then try my hand at DataYours. Impressive work so far!

akbooer · April 29, 2014, 6:51pm

Is there anything that dataMine does in particular that you don’t see in DataYours?

SirMeili · April 29, 2014, 8:06pm

Is there anything that dataMine does in particular that you don’t see in DataYours?[/quote]

The only thing I’ve noticed (and it’s not your fault) is that ImperiHome will display graphical data based on DataMine. Once again, I don’t consider this your fault at all. Once DataYours is out of Beta, I planned on asking them to add it.

I’m not trying to speak for @sabolcik, I just can only think of that one thing, but I’ve never used Datamine before, either. So what do I know

TC1 · April 29, 2014, 8:16pm

One thing folks need to keep in mind is that DataMine is no longer being actively developed. What you see is what you get, and that’s it.

akbooer · April 29, 2014, 10:32pm

Another thing to note is that dataMine is quite resource intensive. When I removed it, I got back ~1.2 Mbyte of ‘disk’ space and several Mbyte of RAM. DataYours code size is, in comparison, about 60kB. Of course, they do slightly different things.

dbmccallum · May 28, 2014, 3:57pm

I am new to Vera world, going on week 2, so please forgive me if this has been discussed and I missed it.

I am quite familiar with Industrial Data Historians, they use a data compression algorithm to reduce storage space and improve history recall times. A simple compression algorithm is check current value to previous value, if greater of less than x% than that data gets stored. So, if you have a temperature measurent that collects data every 6 minutes, if it has not changed dont log it until it does. This becomes important when we are logging 100ks data points every 10 seconds, but might be an easy nice addition to the code.

Looking forward to getting this app up and running

akbooer · May 28, 2014, 5:38pm

Actually, this is already a feature of MiOS watching device variables, and one which is carried directly over in the dataMine app (sadly, not now under further development.) The alternative “round robin” approach uses a fixed archive size, and this is the method used by a number of data historian-type stores including RRD and Whisper (which is what underlies DataYours.) The extra twist that Whisper brings is to enable multiple archives with a monotonic decrease in sample rate and an increase in duration.

The down side of only recording changes is that it makes interpretation of the data difficult, and this is addressed in both dataMine and DataYours by judicious use of latching the last data value in the plotting algorithm. But it’s only a visual trick, and to use the data correctly for anything other than plotting requires some care. I’m thinking of providing an alternative data-polling front-end to mitigate this problem.

However, I’m looking forward to discussing your ideas and experiences in order to bring the best into DataYours (still very much under development - I have a newer version than that published so far, so do ask if you’re serious in wanting to test the latest.)

Aaron · May 30, 2014, 5:33am

akbooer
Are you going to publish this in the Vera App Store soon, so we can get auto updates?

I just installed this… looking forward to trying it.

Question:
I added several items (seemed to ‘confirm’ in browser when running the URL) I’m not seeing them logging in the /whisper/ folder? All I have there are 2 files
.163.urn^micasaverde-com^serviceId^SecuritySensor1.Tripped
.185.urn^micasaverde-com^serviceId^SecuritySensor1.Tripped

I adding things like…
http://192.168.2.71:3480/data_request?id=lr_DataWatcher&watch=*.urn:upnp-org:serviceId:TemperatureSensor1.CurrentTemperature
http://192.168.2.71:3480/data_request?id=lr_DataWatcher&watch=*.urn:upnp-org:serviceId:HVAC_UserOperatingMode1.ModeStatus
http://192.168.2.71:3480/data_request?id=lr_DataWatcher&watch=*.urn:micasaverde-com:serviceId:DoorLock1.Status
http://192.168.2.71:3480/data_request?id=lr_DataWatcher&watch=*.urn:micasaverde-com:serviceId:DoorLock1.sl_UserCode
http://192.168.2.71:3480/data_request?id=lr_DataWatcher&watch=*.urn:micasaverde-com:serviceId:DoorLock1.sl_VeryLowBattery

?

thx for the help!

akbooer · May 30, 2014, 6:42am

Yes, as soon as I get enough positive feedback (not that I’ve got much negative, just not yet a lot of anything at all.) I have to say, though, that I’m not a fan of auto-updates.

I added several items (seemed to 'confirm' in browser when running the URL) I'm not seeing them logging in the /whisper/ folder? All I have there are 2 files .163.urn^micasaverde-com^serviceId^SecuritySensor1.Tripped .185.urn^micasaverde-com^serviceId^SecuritySensor1.Tripped

This is all very out of date and only an initial proof of concept. You shouldn’t be using the code on this thread, but the one from the beta test thread here: [url=http://forum.micasaverde.com/index.php/topic,24669.msg171205.html#msg171205]http://forum.micasaverde.com/index.php/topic,24669.msg171205.html#msg171205[/url] which has a much more comprehensive dashboard interface (and some documentation in the form of a user manual.) You need also to apply the few updates pointed to in that first post.