A script to reboot Vera when needed...

snovvman · March 22, 2016, 12:48am

There are plenty of threads regarding UI7 and the dreaded daily “Can’t detect device” error. For me, they pop up early in the morning and the only way to resolve it is a reboot (restarting LUUP does not do it).

I do not know scripting. I assembled the script below from various threads. I want the scene to determine if an error is present. If it is, restart Vera. Is the script below correct? Thanks.

local hasFailure = false for devNum, devAttr in pairs(luup.devices) do local commfailure = luup.variable_get("urn:micasaverde-com:serviceId:HaDevice1", "CommFailure", devNum) if (commfailure == "1") then hasFailure = true end end if (hasFailure == true) then luup.call_action("os.execute("reboot")") end

akbooer · March 22, 2016, 7:37am

I think this would be closer to the mark:

for devNum in pairs(luup.devices) do
  local commfailure = luup.variable_get("urn:micasaverde-com:serviceId:HaDevice1", "CommFailure", devNum)
  if (commfailure == "1") then
    os.execute "reboot"
  end
end

snovvman · March 22, 2016, 2:06pm

[quote=“akbooer, post:2, topic:191694”]I think this would be closer to the mark:

for devNum in pairs(luup.devices) do local commfailure = luup.variable_get("urn:micasaverde-com:serviceId:HaDevice1", "CommFailure", devNum) if (commfailure == "1") then os.execute "reboot" end end[/quote]

Thank you. I’ve implemented your code into my scene.

For curiosity purposes, after seeing your code, I restructured the one I originally posted–would this have been more correct? I was wondering why it was needed to first set the hasFailure variable to false, then checks for it, then set another variable, and lastly execute based on the variable state.

Reading your profile “Less is more”, I presume this is a lot of code doing unneeded things?

Cheers.

local hasFailure = false
for devNum, devAttr in pairs(luup.devices) do
local commfailure = luup.variable_get("urn:micasaverde-com:serviceId:HaDevice1", "CommFailure", devNum)
if (commfailure == "1") then
	hasFailure = true
		if (hasFailure == true) then
			os.execute "reboot"
		end
	end
end

akbooer · March 22, 2016, 3:14pm

Does it work?

For curiosity purposes, after seeing your code, I restructured the one I originally posted--would this have been more correct? I was wondering why it was needed to first set the hasFailure variable to false, then checks for it, then set another variable, and lastly execute based on the variable state.
Reading your profile “Less is more”, I presume this is a lot of code doing unneeded things?

Yes, exactly so:

[ul][li]you’re not using [tt]devAttr[/tt] (which, BTW, is not exactly the device attributes)[/li]
[li]you have no need of the logical variable [tt]hasFailure[/tt], because it’s logically equivalent to the condition [tt](commfailure == “1”)[/tt][/li]
[li]there is no need for the test [tt](hasFailure == true)[/tt], since the preceding statement sets it to true, it will always be so.[/li][/ul]

The less you write, the less can go wrong. I had a programmer working for me once who said:

I don't understand what went wrong, I only changed one line!

snovvman · March 22, 2016, 8:24pm

Thank you again for the explanation. The “Can’t detect devices” error pops up 5 out of 7 days. I will observe the system and let you know if the script works. Cheers.

snovvman · March 25, 2016, 8:23pm

[quote author=akbooer link=topic=36976.msg275772#msg275772 date=1458659672]

Does it work?

[quote]

The script appears to be working. However–I was wrong about the regularity in which this error appears. I previously thought that it was always between 5 and 6 AM, but today the error did not appear until after 6 AM. By that time, the script had already ran for the day.

In consideration of that–is it possible to write a “watchdog” script so that if/when the error appears and persists for ten minutes (to ensure that it is not transient), then the reboot?

If that is not possible, I suppose my remaining option is run the script once per hour or in some repetitive fashion. It’s not my preference, but I see no way out of the problem until Vera fixes the root cause.

akbooer · March 25, 2016, 10:58pm

It’s fairly straight-forward to write a callback routine which monitors changes in all CommFailure variables and times how long they remain in that state.

Is that really what you think is necessary to keep your Vera going?

snovvman · March 26, 2016, 3:20pm

Undesirably, yes [for now]. This thread http://forum.micasaverde.com/index.php/topic,36482.60.html is one of many examples where owners have contacted Vera with the same problem but without a resolve. Several reported that Vera acknowledges the problem and is working on a fix. However, that has been going on since 2014.

All the issues are related to UI7. Some were UI5 to UI7, one owner bought a Vera Plus to replace Vera Lite after he ecountered the problem with Lite, only to find that Plus had the same problem.

I have not examined the logs to see if there is a clue (my knowledge would be limited anyhow).

I rather not return to UI5, so for the time being, I figure I can just trigger a reboot whenever the problem arises.

The scenario would be:

Whenever the specific type of failure appears, wait ten minutes, check to see if the error persists, if yes, reboot Vera. I want the wait time because sometimes cameras will show up as “not responding”, but that is temporary.

If you can provide some guidance [read: provide a script ;)], I would much appreciate it. Thanks for your continued assistance.

akbooer · March 26, 2016, 4:37pm

I imagine it would be something along these lines (run in Lua startup) …

CommsTable = {}

function CommsMonitor (device, service, variable, _, value_new)
    local now = os.time ()
    local hash = table.concat ({device, service, variable}, '.')
    CommsTable[hash] = {time = now, state = value_new}
    luup.log (table.concat ({"Comms state change: ", hash, value_new}, ' '))
end

function CommsPulse ()
  local timeout = os.time() - 10 * 60           -- ten minute timeout
  for hash, info in pairs(CommsTable) do
    if (info.state == "1") and (info.time < timeout) then
        luup.log "rebooting due to overdue Comms failure"
        os.execute "reboot"
    end
  end
end

luup.variable_watch ("CommsMonitor", "urn:micasaverde-com:serviceId:HaDevice1", "CommFailure")
luup.call_delay ("CommsPulse", 60, '')

…I have not tested this.

snovvman · March 27, 2016, 5:47pm

Thank you very much! I’ll give this a try later today and report back.

brientim · March 27, 2016, 6:40pm

I see the issue that could arise is an infinite reboot loop every 10 minutes if a device fails thus rending the whole system inoperable!

Depending on the scale of the deployment, you may need to increase the time to allow more attempts.

You could to increment a variable that before executing the reboot verify that after x reboots don’t and notify only.

akbooer · March 27, 2016, 6:50pm

Yes, there may be more checking required, but note that the timeout only begins when the state is set to failure. I’m unsure how system startup behaves, but if the device has already failed, then the state should not be reset. It needs testing.

I really don’t approve of this approach, but needs must…

snovvman · March 28, 2016, 1:45pm

Thank you both. I have a Vera Plus inbound to replace the Lite. I will first make the upgrade, see how things go, and perhaps work with support before I apply this script. I agree that this solution is not nearly ideal. The UI5 to UI7 move has claimed many victims…