Thanks for the repost - a good choice of place to continue the conversation!
The Graphite/Whisper stuff looks really good.
Whisper [u]is[/u] good, because you can, if you like, just run it as a round-robin database, putting a limit on the maximum duration you're ever interested in, [u]or[/u] you can aggregate the data gracefully over time, reducing the resolution but increasing the maximum time limit. All I've done is to translate it to Lua from Python.
What's the plan for importing existing dataMine data. The data conversion is easy enough but through what method will the user experience the import process? Or would you write a "custom finder"?
There's two ways to go here (not mutually exclusive):
[ol][li]data conversion - as you’ve mentioned, could be en bloc or on an as-needed basis. I’ve found my dataMine archive getting rather ragged as old devices go and new ones replace them. Selective conversion makes sense. Obviously some user involvement here to choose what’s wanted.[/li]
[li]database federation - quite possible to add a thin layer which makes it invisible whether the data is in Whisper or dataMine. That way, with almost no user involvement, you could ensure that the old data is still available through the new interface.[/li][/ol]
How will meta data be handled? For example dataMine knows what category of energy consumption a device falls into eg HVAC, Whole_House, Entertainment.... Plus data offsets can be applied to data to calibrate it.
An excellent question! Again, two answers:
[ol][li]device category - In EventWatcher, I tried to use Vera’s device categorization, but it falls short because it is not applied uniformly (or at all, in some cases.) The only categorization which is reliable is the device / serviceId / variable parameter group as used in luup.variable_set or get. The Graphite/Whisper database presents a tree-structured hierarchy which maps directly onto this. I’m using a namespace of the form: [tt]Vera-serialNo.deviceNo.serviceId.variable[/tt], so there’s no ambiguity about which variable (eg. Watts) belongs to which device. Wildcards are also allowed in some contexts, so [tt]Vera-12345678.*.urn:micasaverde-com:serviceId:EnergyMetering1.Watts[/tt] refers to all the power variables on that machine.
[/li]
[li][tt]other metadata[/tt] - Chris’s philosophy for dataMine was that raw data was sacrosanct and shouldn’t be changed. The offsets and filters are just a layer above that in the graphics. Same will apply here, but it may need a bit more care since an erroneous reading will get propagated through the Whisper aggregation process as the data ages. Indeed, a separate ‘database’ will be needed for this metadata… MySQL ??? (no, only joking, I currently plan to retain the same metadata files as dataMine)[/li][/ol]
Also what hardware do you see this software running on - that supports Apache, Python, etc?
Well, my initial requirement was just Vera. All the above sounds rather heavy, but in fact it's not at all. Obviously the data collection front-end has to run on Vera and I'm currently testing a version which: is ~200 lines long; doesn't run as a plugin so doesn't need its own stack space; doesn't need any file system (eg. CIFS or USB) to get the data off Vera; is fully configurable over HTTP; writes UDP packets to syslog and another data logging port. The output is designed to feed easily into: StatsD, RRD, Graphite, syslog, ...
The design route I’ve been following ensures a modular structure with standard interfaces, so whilst, like dataMine, it could all run on Vera, it equally well could be split across multiple, inhomogeneous, systems. I currently have three front-end acquisitions feeding into one syslog and another UDP socket for archival storage (but not yet connected up). The graphics from dmDBserver, or EventWatcher, are very light-weight too and the rendering is done in browser itself. The translated Whisper database is pure Lua so will run anywhere.
I can imagine this running on a NUC, that's say recording TV programs for me and simultaneously talking to any number of Veras and any other enabled equipment.
Whatever takes your fancy.
Given the number of reboots I experience, I worry about what would happen if Vera lost contact with the database. Any chance of caching the data for a day or so / sounding the alarm when the link dies.
Once again, the acquisition doesn't rely on a file system, so if Vera's up (and your network) then all should be well. You could also have a local database, but I haven't thought about synchronisation on network restarts in such a configuration.
Sorry for all the questions - this looks very promising ...
The questions are fine - I'm not sure whether the answers do them justice, so ask again if not clear.
...but will certainly require the user have a high level of knowledge re: computers.
No, I think you're wrong. The plan is that out-of-the-box this will be easier than dataMine if you just want to do a plain Vera install. If you're capable, though (and I assume that someone running a NAS with mySQL or an external server will be) then lots of configurations might be possible. The plan to easily interface to existing tools means that we don't have to roll our own.