Do you do a Scheduled Daily Reboot

resq93 · February 4, 2018, 9:09pm

How many of you do a daily or weekly power cycle of your unit? I have the ability to do this with a WiFi outlet and was wondering if there is any benefit? Wondering if this would help with the occasional Vera errors or occasional sluggishness that just seems to crop up from time to time.

cc4005 · February 4, 2018, 9:53pm

Weekly, but I do it using os.execute( ‘reboot’ ) rather than power cycle. Just started a couple weeks ago so can’t say yet whether it appears to help stability.

rigpapa · February 4, 2018, 9:55pm

I hammer on my Veras pretty hard when I’m working on plugins, and rarely find it necessary to do any kind of hard reset. When needed, I usually just launch /sbin/reboot via ssh. I can’t recall how long it’s been since I’ve had one in a state that only power cycling would recover (arriving in that state spontaneously; when upgrading firmware, all bets are off). It has happened, but not often, and not in a while.

HSD99 · February 4, 2018, 10:10pm

I do a scheduled daily reboot via a scene, using Vera’s OS reboot command. This insures an orderly shutdown and restart. I used to do it weekly, but found that Vera would sometimes get a bit sluggish after four or five days. Both my Veras are connected to IP-based power switches, which are powered from a UPS. If I need to power-cycle either unit, it can be done either locally or remotely. So far that hasn’t been necessary.

The IP-power switch is designed for server control. It pings each Vera once per minute, and if 3 consecutive pings fail, it will turn of the power, wait 30 seconds, then turn it back on. I did this because I travel a great deal, and am frequently away from an internet connection. Having done all this, the power switch logs show that neither of the units have required a power cycle. But just in case… ;D

resq93 · February 4, 2018, 10:14pm

I could try that but I’ve found in the past that those scenes won’t run when Vera is in a Lua Error state.

M

trygve · February 5, 2018, 1:41am

I don’t. Have just above 100 devices and a vera edge. Just a handfull of plugins. Only reboot I do are on upgrade. Not seen any need to do reboots to be honest.

HSD99 · February 5, 2018, 1:58am

[quote=“resq93, post:5, topic:198507”]I could try that but I’ve found in the past that those scenes won’t run when Vera is in a Lua Error state.

M[/quote]
You could run the reboot as a scheduled job in /etc/crontab. That would work regardless of the state of the Lua subsystem.

kwieto · February 5, 2018, 9:24am

One of my Vera’s is in remote location, where I’m not present most of the time. For this unit I have WiFi wallplug switch (WeMo), which I can turn off/on remotely if Vera gets unresponsive for some reason. During last year using such switch was needed once or twice.
After such cases I’ve installed Datamine2 and System Monitor Plugins and track memory status. If everything is correct, the amount of free/used/cached memory should be more or less stable.
If your memory has big up and down’s or is considerably low, it is a signal that something is happening to your unit. This is how I found wrong logging settings on my Edge, resulting in “can’t write user data” error.

But I don’t do reboots on regular basis, as I don’t see any profit here. If system works fine (stable memory amounts, no frequent luup reloads/restarts), I don’t think there is a need to reboot it.
Occasional sluggishness may be caused by luup reloads (you can use system monitor plugin to track it), and in this case rebooting doesn’t do much help.

resq93 · February 5, 2018, 2:16pm

[quote=“HSD99, post:7, topic:198507”][quote=“resq93, post:5, topic:198507”]I could try that but I’ve found in the past that those scenes won’t run when Vera is in a Lua Error state.

M[/quote]
You could run the reboot as a scheduled job in /etc/crontab. That would work regardless of the state of the Lua subsystem.[/quote]

Not sure i know how to do that. Can you elaborate with instructions?

Ty

M

HSD99 · February 5, 2018, 4:19pm

http://forum.micasaverde.com/index.php/topic,6751.msg42761.html#msg42761

This is an old thread. The /etc/crontab edits were suggested by MCV. If you are not comfortable with SSHing into your Vera and editing files, you might not want to use this method. This solution probably will not survive a firmware upgrade or restoration from a backup as /etc/crontab may get overwritten.

kwieto · February 6, 2018, 8:29am

Additional thought about reboots

Today my Vera got unresponsive.
I did reboot and checked memory usage in Datamine - yesterday amount of free memory dropped down significantly consumed by unreasonable increase of cached memory (see screenshot)
I plan to make a scene forcing controller to reboot, just not on the schedule, but on the basis of free memory left (i.e. warn me if it drops below 50k and do a reboot if it drops below 25k)

gniknalu · February 6, 2018, 2:52pm

Found that if I don’t do a reboot every couple of weeks, I’m forced to do it when it stops responding altogether. One hasn’t been scheduled (and thank you everyone for posting methods of HOW to do this) but it will happen. It sucks pretty bad when you’re not at home and you get an alert saying your Vera is down and you have NO WAY to fix it remotely. When I do an occasional ‘bounce’ every couple of weeks, this doesn’t happen.

Vera Secure
Firmware 1.7.3535

HSD99 · February 6, 2018, 4:56pm

Every situation is unique. There is isn’t a “one size fits all” solution to the “should I schedule an automatic reboot” question. I do a daily reboot because the potential downsides (for me) are very small, while the upside is large. My experience with Vera (starting with a VL on UI5 and now two VP on UI7-1.7.3232 and 1.7.3532) is that eventually the system became unstable for whatever reason. This instability might corrupt the system to the point that a reboot didn’t help, and restoring from an older backup was the only way to regain a stable system. My daily reboot insures that memory leaks or whatever else is happening don’t get an opportunity to become a problem. So far, this has worked well for me. I’ll be interested in seeing if kwieto’s script to reboot on low memory works for him.

kwieto · February 6, 2018, 6:46pm

I’ll post an update, but taking into consideration that something like that happened only once (I have this controller and setup for about a month) I can’t predict if and when it will happen again.
For previous controller, I didn’t do any reboots for couple of months, except those during update or moving controller to another place.

Just for information, as I’ve checked Datamine records, the system was fully operational during “offline” time. Sensors were reporting data, power measurement was also reported correctly. It is a huge advantage of Plus (or newer firmware) over my previous controller (Edge), where low amount of memory caused “can’t write user data” error and it was just hanging-up.

kwieto · February 7, 2018, 2:35pm

As usual, you can make various predictions and reality goes on its own: today the problem with increasing amount of cached memory repeated.
Scene worked perfectly, rebooting controller long before memory would be drained enough to make it unstable.
Controller rebooted and in two minutes it was operational.
Nevertheless, time to issue a ticket…

HSD99 · February 7, 2018, 6:45pm

[quote=“kwieto, post:15, topic:198507”]As usual, you can make various predictions and reality goes on its own: today the problem with increasing amount of cached memory repeated.
Scene worked perfectly, rebooting controller long before memory would be drained enough to make it unstable.
Controller rebooted and in two minutes it was operational.
Nevertheless, time to issue a ticket…[/quote]

I’ll be interested in Vera’s analysis—and congratulations that your scene accomplished its purpose!

kwieto · February 7, 2018, 8:49pm

If you want, here are error messages from the time around which the increase of cached memory started. I don’t understand this enough to say where (and if) is the issue:

02      02/07/18 10:18:42.391   eZWJob_PollNode::ReceivedFrame HandlePollUpdate failed job job#1602 :pollnode #12 dev:197 (0x1b21938) N:12 P:100 S:5 Id: 1602 got after 1 seconds FUNC_ID_APPLICATION_COMMAND_HANDLER node info for 12 status 0 data 0xdd 0x14 0xc0 0xdb 0x89 0x4b 0x35 0x12 (e[34;1m#####K5#e[0m)e <0x76377520>
24      02/07/18 10:18:42.784   ZWaveSerial::Send m_iFrameID 42927 type 0x0 command 0x13 got failure 0x18 iNumFailedResponse 1 m_iSendsWithoutReceive 0 numretriesforack 3 <0x76377520>
01      02/07/18 10:18:43.006   eZWaveNode::DecryptMessage node 12 dev 197 failed and backup nonce is 0e <0x76377520>
02      02/07/18 10:18:43.034   eZWJob_PollNode::ReceivedFrame HandlePollUpdate failed job job#1602 :pollnode #12 dev:197 (0x1b21938) N:12 P:100 S:5 Id: 1602 got after 2 seconds FUNC_ID_APPLICATION_COMMAND_HANDLER node info for 12 status 0 data 0x91 0x2d 0x82 0x4c 0x1b 0x36 0x34 0x5 (e[34;1m#-#L#64#e[0m)e <0x76377520>
01      02/07/18 10:18:45.116   eZWaveSerial::Send m_iFrameID 42935 type 0x0 command 0x13 got repeat failure 24 iNumFailedResponse 1 time 39213116 start time 39213076 wait 2000 m_iSendsWithoutReceive 0e <0x75d77520>
01      02/07/18 10:18:49.188   eZWJob_PollNode::Run job job#1602 :pollnode #12 dev:197 (0x1b21938) N:12 P:100 S:1 Id: 1602 ZW_Send_Data to node 12 failed 5 req (nil)/-1 abort m_iFrameID 0e <0x75d77520>
02      02/07/18 10:18:49.189   eZWJob_PollNode::PollFailed job job#1602 :pollnode #12 dev:197 (0x1b21938) N:12 P:100 S:1 Id: 1602 node 12 battery 0 notlist:0e <0x75d77520>
06      02/07/18 10:18:49.193   Device_Variable::m_szValue_set device: 1 service: urn:micasaverde-com:serviceId:ZWaveNetwork1 variable: eLastErrore was: Poll failed now: Poll failed #hooks: 0 upnp: 0 skip: 0 v:0x1143618/NONE duplicate:1 <0x75d77520>
24      02/07/18 10:18:49.193   ZWJob_PollNode::m_eJobStatus job job#1602 :pollnode #12 dev:197 (0x1b21938) N:12 P:100 S:2 Id: 1602 <0x1b21938> m_eJobStatus Failed after 10.89295000 seconds <0x75d77520>
04      02/07/18 10:18:49.194   <Job ID="1602" Name="pollnode #12 7 cmds" Device="197" Created="2018-02-07 10:18:39" Started="2018-02-07 10:18:39" Completed="2018-02-07 10:18:49" Duration="10.89295000" Runtime="10.75287000" Status="Failed" LastNote="" Node="12" NodeType="ZWaveMultiEmbedded" NodeDescription="Światło|Wisła"/> <0x75d77520>
02      02/07/18 10:18:54.995   eZWJob_SendData::ReceivedFrame job job#1603 :Wakeup done 30 dev:48 (0x1bf3c40) N:30 P:102 S:5 Id: 1603 to node 30 command 132/8 failed m_cTxStatus 1 retries 0e <0x76377520>
01      02/07/18 10:18:54.995   eZWJob_SendData::ReceivedFrame job job#1603 :Wakeup done 30 dev:48 (0x1bf3c40) N:30 P:102 S:5 Id: 1603 to node 30 command 0x84/0x08 failed 0/0 or Quit 0e <0x76377520>
24      02/07/18 10:18:54.995   ZWaveNode::m_bLastContactFailed_set device 48 = 1, force 0, m_bNotListening 1 zw poll9 <0x76377520>
24      02/07/18 10:18:54.996   ZWaveNode::m_bLastContactFailed_set device 48 skipping <0x76377520>
10      02/07/18 10:18:54.996   Job::m_sNotes_set job#1603 :Wakeup done 30 dev:48 (0x1bf3c40) N:30 P:102 S:5 Id: 1603 dataversion 955947397 changing from Waiting for node to reply after 0 retries -to- Cannot contact device, error code: 1 <0x76377520>
01      02/07/18 10:18:54.996   eZWJob_SendData::JobFailed job#1603 :Wakeup done 30 dev:48 (0x1bf3c40) N:30 P:102 S:5 Id: 1603 Priority 102e <0x76377520>
04      02/07/18 10:18:54.998   <Job ID="1603" Name="Wakeup done 30" Device="48" Created="2018-02-07 10:18:40" Started="2018-02-07 10:18:49" Completed="2018-02-07 10:18:54" Duration="14.81387000" Runtime="5.801944000" Status="Aborted" LastNote="Cannot contact device, error code: 1" Node="30" NodeType="ZWaveBinarySensor" NodeDescription="Skrzynia|Pokrywa"/> <0x76377520>

There are a lot of other messages here, but part of above are red or orange (rest is black) so I suppose they are bigger issues (?)

HSD99 · February 7, 2018, 10:03pm

[quote=“kwieto, post:17, topic:198507”]If you want, here are error messages from the time around which the increase of cached memory started. I don’t understand this enough to say where (and if) is the issue:
There are a lot of other messages here, but part of above are red or orange (rest is black) so I suppose they are bigger issues (?)[/quote]

Is this the actual log dump, or has it been filtered for errors?

kwieto · February 7, 2018, 10:23pm

Filtered.
As the controller is on remote location and I can’t access log easily, I used AltUI and os.command panel buton: “Errors Warnings” (os.command: cat /var/log/cmh/LuaUPnP.log | grep -i -E “warning|error|failed” )

HSD99 · February 7, 2018, 10:30pm

Thanks. I also use ALTUI for that purpose. It’s quite useful—now if Vera can tell you what it all means…