dmDB - a read-only wrapper for the dataMine database

dataMine is great - more’s the pity that development is currently stalled.

As it stands, though, it is a write-only memory so far as Lua code is concerned… it has all that great data which you can plot interactively, but how much better if you could access this programmatically?

I’ve written lightweight API to read the dataMine data so that we can build on the dataMine legacy data.

For a rationale of what this is about, and the documentation and code, see this thread:
[url=http://forum.micasaverde.com/index.php/topic,16166.msg135887.html#msg135887]http://forum.micasaverde.com/index.php/topic,16166.msg135887.html#msg135887[/url]

For the explanation as to why I’ve started a new thread, blame @Aaron!
[url=http://forum.micasaverde.com/index.php/topic,17232.msg135898.html#msg135898]http://forum.micasaverde.com/index.php/topic,17232.msg135898.html#msg135898[/url]

Comments welcomed here, rather than on those other threads.

Please forgive my ignorance but can you give some nuts and bolts examples in a more human readable fashion describing the utility of this? Am I correct in that this is simply a necessary stepping stone for more sophisticated interaction with the data which datamine can gather? I’m currently not using datamine because about a year ago when I tinkered with it, it only really logged things in a fashion unsuitable for my needs. Seeing a bar chart or some such of energy usage or indoor temperatures is not of particular interest to me. Being able to look back and see when certain people entered the house is. Maybe this is possible with datamine and I just never figured out how to get it to show up.

Sorry to be so obscure.

Am I correct in that this is simply a necessary stepping stone for more sophisticated interaction with the data which datamine can gather?
It certainly can be that. As I said, a further abstraction layer which allows not just the retrieval of raw data, but some useful statistical analysis and interpolation is very much what real 'data mining' is all about.
I'm currently not using datamine because about a year ago when I tinkered with it, it only really logged things in a fashion unsuitable for my needs. Seeing a bar chart or some such of energy usage or indoor temperatures is not of particular interest to me.
No indeed, it shouldn't just be about interactive plotting. However, you can't deny that this is of great utility to some people.
Being able to look back and see when certain people entered the house is. Maybe this is possible with datamine and I just never figured out how to get it to show up.
OK, so if security is your thing, then how about:

[ul][li]retrieve, for the last week/month, the number of times event X (eg. “certain people entered the house”) happened[/li]
[li]the number of times a door was open for more than Y seconds[/li]
[li]the number of times a day a certain PIR is triggered[/li]
[li]the average length of time a room was occupied[/li]
[li]whether two events happened within Z minutes of each other…[/li][/ul]

All this, and more certainly, available with a modicum of additional effort once you have access to the history data in a programmatic fashion.

Please forgive. I never meant to suggest that datamine is useless. Just that it isn’t particularly useful for me and the specific interests I have at this point in time.

Your following examples are very interesting to me though. So your utility would out of the box allow such queries? If so I will begin to experiment with it tonight. Is there an additional interface I would need to make such queries?

OK, so example #1: “retrieve, for the last week/month, the number of times event X (eg. “certain people entered the house”) happened”, then something like this:

(a) get the data:

local start = os.time {day = 1,  month = 10, year = 2013}
local stop  = os.time {day = 31, month = 10, year = 2013}
dm,status = dmDB.open ()
dmc,status = dm:openCursor {key = {Variable = "Status", Device = 51} -- or something like this
x,t, status = dmc:getSearchKeyRange {t1 = start, t2 = stop }
dmc:close ()
dm:close()

(b) interpret it:

local n = 0
for i = 2,#x do
    if x[i-1] == "0" and x[i] == "1" then n = n + 1 end -- look for 0 -> 1 transition
end

…gives you “n”, the number of times a sensor has gone from 0 to 1 during October. If it’s detecting what you want, then it’s a useful answer.

Here’s a much easier way to access dataMine data using the dmDB module… an HTTP server which responds to queries with CSV time/value pairs for a specified channel over a specified time range.

Attached to this post is an updated version of the dmDB module, which can simply be loaded onto Vera using the Luup files menu on the Develop Apps page.

The following server code can then be added to your Startup Lua, or just tried out in the Test Luup code section.

require "dmDB"

dm, dmDBstatus = dmDB.open ()
luup.log (dmDBstatus)
luup.log ('dmDB version = '..dmDB.version)

----
--dmDB_handler() responds to requests like:
-- [your_Vera_url]:3480/data_request?id=lr_getSearchKeyRange
--&output_format=csv
--&DeviceNum=7
--&serviceId=urn:upnp-org:serviceId:Dimming1
--&Variable=Test
--&From=42
--&To=88
function dmDB_handler (lul_request, lul_parameters, lul_outputformat)
	local max_points = 10080								-- save Vera from running out of memory
    local cursorKey, keyRange = {}, {}
	local function query ()
    	local keys = {
    		serviceid = "Service", service = "Service", 				-- openCursor keys and aliases
    		devicenum = "Device",  device = "Device",
    		id = "Id", channel = "Id",
    		variable = "Variable", name = "Name"} 
		local times = {
			t1 = "t1", from = "t1",	start = "t1",			-- searchKeyRange parameters and aliases
			t2 = "t2", to = "t2", stop = "t2"}
	    local k,l,t
	    for i,j in pairs (lul_parameters) do
	    	l = string.lower(i)
	    	k = keys[l] 
	    	t = times[l]
	    	if k then cursorKey[k] = j elseif t then keyRange[t] = tonumber(j) end	-- ignored if neither key nor time
	    end
	    local dmc, status = dm:openCursor {key = cursorKey}
	    if dmc then
	    	local n = 0
	    	local cpu = os.clock()
	    	status = {}
			for v,t in dmc:searchKeyRange (keyRange) do	
				n = n + 1
				status[n] = ("%d,%g"): format (t,v)	-- currently use csv as default format	
				if n == max_points then break end
			end  	
			status = table.concat (status, '\n')
			luup.log (("CPU time for request %s = %.3f mS for %d points"): format (lul_request,(os.clock()-cpu)*1e3, n))
			dmc:close ()
	    end    
	    return status
	end    
    local lul_content_type = "text/plain"
	local ok, result = pcall (query)    
    return result, lul_content_type
end

luup.register_handler ("dmDB_handler", "getSearchKeyRange")

Using the default json-dm module, this startup code can take about 10 seconds to run, but this can be changed by editing the line:

dm, dmDBstatus = dmDB.open ()

to

dm, dmDBstatus = dmDB.open {json = "your-favourite-json-module" }

…the one I use is about 20 times faster. However, once open, the actual queries don’t depend on the speed of the json module.

The server responds to HTTP requests of the form:

http://[your Vera IP address]:3480/data_request?id=lr_getSearchKeyRange&Variable=Varname&DeviceNum=42&from=1376356271&to=1377572571

with a comma-separated list of the data (time/value pairs) within the given range of (UNIX) time stamps of the sort:

1376356271,0.275625
1376356331,0.278264
1376356391,0.280903
1376356452,0.283542
1376356512,0.286181
1376356572,0.288819

The possible request parameters and the meaning of their values are:

[ul][li][tt]Service[/tt] or [tt]ServiceId[/tt] - ServiceId[/li]
[li][tt]Variable[/tt] - Luup variable name[/li]
[li][tt]Device[/tt] or [tt]DeviceNum[/tt] - Device number[/li]
[li][tt]Id[/tt] or [tt]Channel[/tt] - dataMine channel number[/li]
[li][tt]Name[/tt] - dataMine channel name[/li]
[li][tt]T1[/tt] or [tt]From[/tt] or [tt]Start[/tt] - start Unix timestamp[/li]
[li][tt]T2[/tt] or [tt]To[/tt] or [tt]Stop[/tt] - ending Unix timestamp[/li][/ul]

As with the dmDB module itself, any mix of the above parameters may be defined, in any order, and their values (not their names listed above) can be truncated, so long as the whole query uniquely identifies a dataMine channel being logged. Usually sufficient is Luup variable name and device number. The T1/T2 or start/stop or from/to parameters define the time window of interest. T1 defaults to the earliest time in the database, and T2 defaults to the current time.

To save yourself from crashing Vera through using up all the memory, I’ve arbitrarily limited the number of returned points to a maximum of 10,080. This runs perfectly happily on my VeraLite and allows queries to span, for example:

[ul][li]1 year @ 1 hour sample rate = 8,760[/li]
[li]1 month @ 5 minutes sample rate = 8,928 (for 31 days)[/li]
[li]1 week @ 1 minute sample rate = 10,080[/li][/ul]

I hope this is of some use to those wishing to do some real data mining, especially as it opens up the possibility of using a separate machine to crunch the numbers subsequently. My goal is still to wrap this up in another layer of code (and, no doubt, a plugin) which will provide real statistical data queries and resampling capabilities, with a variety of response formats.

Feedback welcomed.

Here’s an update of the previous post with a version of the HTTP server which now allows ISO 8601 extended-format date parameters, as well as the original Unix epoch in seconds, so human-readable dates can now be used.

The format syntax is: YYYY-MM-DDThh:mm:ss which can be truncated anywhere for lower precision with usual Lua defaults (if you leave the hours out, that’s 12 noon). Be aware that 2013 on its own, for example, will be treated as a Unix time, whereas 2013- is the year.

Server requests for dataMine history can now look like:

http://[your Vera IP address]:3480/data_request?id=lr_getSearchKeyRange&Variable=Varname&DeviceNum=42&from=2013-01-01T00&to=2013-03-31T23:59

Here’s the Lua code for the server (see previous post for the dmDB module and instructions for use).

require "dmDB"

dm, dmDBstatus = dmDB.open ()
luup.log (dmDBstatus)
luup.log ('dmDB: version = '..dmDB.version)

----
--dmDB_handler() responds to requests like:
-- [your_Vera_url]:3480/data_request?id=lr_getSearchKeyRange
--&output_format=csv
--&DeviceNum=7
--&serviceId=urn:upnp-org:serviceId:Dimming1
--&Variable=Test
--&From=42		-- or ISO-8601 extended date/time format  YYYY-MM-DDTHH:MM:SS
--&To=88
function dmDB_handler (lul_request, lul_parameters, lul_outputformat)
	local max_points = 10080											-- save Vera from running out of memory
    local cursorKey, keyRange = {}, {}

    local function dateTime (str)			-- just a number, or ISO date/time format...		
    	if str:find "^%d+$" then return tonumber (str) end
		local field   = {str:match "^(%d%d%d%d)-?(%d?%d?)-?(%d?%d?)T?(%d?%d?):?(%d?%d?):?(%d?%d?)"}
		local name    = {"year", "month", "day", "hour", "min", "sec"}
		local default = {0, 1, 1, 12, 0, 0}
		local datetime = {}
		for i,j in ipairs (name) do
			if field[i] == ''
				then datetime[j] = default[i]
				else datetime[j] = field[i]
			end
		end
    	return os.time (datetime)
	end

	local function query ()
    	local keys = {
    		serviceid = "Service", service = "Service", 				-- openCursor keys and aliases
    		devicenum = "Device",  device = "Device",
    		id = "Id", channel = "Id",
    		variable = "Variable", name = "Name"} 
		local times = {
			t1 = "t1", from = "t1",	start = "t1",			-- searchKeyRange parameters and aliases
			t2 = "t2", to = "t2", stop = "t2"}
	    local k,l,t
	    for i,j in pairs (lul_parameters) do
	    	l = string.lower(i)
	    	k = keys[l] 
	    	t = times[l]
	    	if k then cursorKey[k] = j elseif t then keyRange[t] = dateTime(j) end	-- ignored if neither key nor time
	    end
	    local dmc, status = dm:openCursor {key = cursorKey}
	    if dmc then
	    	local n = 0
	    	local cpu = os.clock()
	    	status = {}
			for v,t in dmc:searchKeyRange (keyRange) do	
				n = n + 1
				status[n] = ("%d,%g"): format (t,v)	-- currently use csv as default format	
				if n == max_points then break end
			end  	
			status = table.concat (status, '\n')
			luup.log (("dmDB: CPU time for request %s = %.3f mS for %d points"): format (lul_request,(os.clock()-cpu)*1e3, n))
			dmc:close ()
	    end    
	    return status
	end    
    local lul_content_type = "text/plain"
	local ok, result = pcall (query)    
    return result, lul_content_type
end

luup.register_handler ("dmDB_handler", "getSearchKeyRange")

As ever, feedback welcomed.

This looks great! Thank you for spending the time developing it. I will install it this week and do some testing.

Hi Firebird… Yes, it’s all your fault, so thanks again for suggesting this! I’ll probably have another update for you before next weekend. Anything else I should add at the moment?

…and here it is. Changes are:

[ul][li]handler code now included in dmDB module itself[/li]
[li]maxpoints option to dmDB.open to limit maximum number of points returned in keyRange queries[/li]
[li]times now optionally output by handler in ISO 8601 format (ie. human-readable)[/li][/ul]

All the code you now need to have in Lua startup or test is:

require "dmDB"
dm, dmDBstatus = dmDB.open ()
luup.log ("dmDB: "..dmDBstatus)
luup.log ('dmDB: version = '..dmDB.version)
dmDB_handler = dm.query
luup.register_handler ("dmDB_handler", "getSearchKeyRange")

If you want to use the maxpoints option in the open database call, the syntax is:

dm, dmDBstatus = dmDB.open {maxpoints = 10000} -- or whatever number you want

To use the ISO date/time format option on output, use an HTTP request like:

http://[Vera IP]:3480/data_request?id=lr_getSearchKeyRange&Variable=Varname&output_format=iso

which gives output like this:

2013-02-03T15:47:15,8
2013-02-03T18:25:00,7
2013-02-04T00:35:45,6
2013-02-04T03:41:02,7
2013-02-04T09:51:47,9

Code attached. Feedback welcome.

Looks good! Thank you nice that I don’t have to convert from Unix time now :slight_smile:

I’m sorry, another update to fix a possible server crash if zero points are found… a tyro error, I’ve instigated better unit testing.

As a bonus, the ISO date format for queries now supports ordinal dates, that is, day numbers. You could represent December 9th, 2012, (the 344-th day of 2012) as either of:

[ul][li]2012-12-09[/li]
[li]2012-344[/li][/ul]

Is their a simple why to use the ISO format in OpenOffice/Excel? If not can there be an OpenOffice/Excel time parameter for the URL? The time zone would default to the user’s locale in Vera (and perhaps could be overridden in the URL if needed).

This formula appears to be the recommended one. You end up with a decimal number that can then be expressed in various formats in the spreadsheet:

=(UNIX_Time_Stamp/86400)+25569+(Time_Zone/24)

http://spreadsheetpage.com/index.php/tip/converting_unix_timestamps/

You said your json parser was faster than the dataMine one - which one are you using? And the time that dmDB takes to respond - is that primarily a function of the parser?

Still thinking about what I could do with this but it’s working well so far - good one!!

The simplest way is not to use it at all. Since you have the formula for Unix time conversion, why not use that instead? The ISO format makes the output 50% bigger and is at least twice as slow to generate.

You said your json parser was faster than the dataMine one - which one are you using? And the time that dmDB takes to respond - is that primarily a function of the parser?
It's the initial reading of the configuration file which takes the time if you're using the default json-dm module, but this is only done once at the time when the startup code is run. Here's a comparison I've just run between the two, taken from the Luup log file: [code] 50 11/23/13 10:34:11.029 luup_log:0: dmDB: dataMine configuration file at '/dataMine/dataMineConfig.json' opened OK, CPU = 440.000 mS 50 11/23/13 10:34:11.029 luup_log:0: dmDB: version = 2013.11.23 @akbooer <0x2e517680> ... 50 11/23/13 10:34:41.427 luup_log:0: dmDB: dataMine configuration file at '/dataMine/dataMineConfig.json' opened OK, CPU = 7330.000 mS [/code] So this json module is 7330/440 = 16.6 times faster. It's also only 9kB rather than 16kB. Code attached below... this needs to be copied with scp, or some such, to /usr/lib/lua/ and not downloaded as a usual Lua file. The only thing it doesn't do is full UTF coding of strings, but then you don't need that on Vera.
Still thinking about what I could do with this but it's working well so far - good one!!
Thanks. I have lots of applications in the pipeline - this was not written just for fun but because I really want to use it. To begin with I'll replace all my scheduled and [i]ad hoc[/i] calculations like max/min temperatures, daily energy use, heating degree days, etc. with on-demand server responses. This way, you can ask unanticipated questions of your system (which if you have a speech interface to your system you are likely to do.) There's actually much more too, but that's under wraps for the moment.

A completely revamped version of dmDB and its HTTP request server is now posted here…

[url=http://forum.micasaverde.com/index.php/topic,18749.0.html]http://forum.micasaverde.com/index.php/topic,18749.0.html[/url]

This includes the capability to display graphical plots of dataMine channels and graphs.

Hi akbooer,

I’m doing an update to the OpenSprinkler plugin and the newest firmware for it always returns JSON. I found a thread where you posted a copy of your JSON parser above.

It works wonderfully and I want to recommend, actually require it to anyone who wants to use this new version of the OS plugin. I just wanted to see if you had a separate thread for it somewhere or if you were interested in putting one up. I would currently point people to the post here rather than put the file up myself so as not to start having multiple copies floating around. If there was one thread specifically for it, with the latest copy in the first post, then we could be sure we were getting the latest and greatest should you be updating it in the future.

What do you think?

Tried to PM you but it was blocked.

OK. I’m not sure I’m up to the responsibility of maintaining such a system component, but it is small(-ish) and fast for pure Lua, and I have found it quite reliable. It was written in response to something of a frustration with other implementations (none is perfect) and the the fact that Vera does not ship with a native JSON library.

There is, indeed, a newer version which supports UTF-8 encoded strings and pretty-printing (which several folks had asked for.). It’s already packaged as part of some of the other apps I have written here.

I’ll start a new thread as suggested.

PS: I went through a stage of not being able to respond to PMs so turned it off then.

Great, thanks akbooer.

New thread with JSON module is here: [url=http://forum.micasaverde.com/index.php?topic=29989.msg213050#msg213050]http://forum.micasaverde.com/index.php?topic=29989.msg213050#msg213050[/url]

The [tt]dmDB[/tt] functionality is now packaged as the [tt]DataYours-dataMineServer[/tt] plugin, and available from the App Store (along with the other [tt]DataYours[/tt] plugins.)