Random Rebooting

I have noticed my scenes failing to trigger for months now. I still don’t know why sometimes they trigger and some times they don’t.

What i have noticed now though is often when looking at graphs from datamine it will report that the vera has rebooted and datamine needs refreshing…

This got me thinking, is the Vera rebooting when the scenes are not firing?

I wonder how many times a day my Vera reboots. Anyone got a good way of logging reboots/panics/crashes etc? I would like to leave it for a few days to see how many reboots have occurred in a day and when.

@akbooer’s EventWatcher will log your restarts if you set LogDirectory to a Vera folder. See: EventWatcher.

However, a common cause of random reboots is running out of free memory. Adding another plugin is likely to make this worse. I would check your memory situation before adding anything more to your system.

@Rex: Do you know what can be done to overcome memory issues ? I’ve got the Vera Lite - and I guess an upgrade to Vera 3 would help (double memory ?); However - as it is only 14 days since I got my Vera Lite in the first place, it might be a bit hasty :wink:

I have heard that USB-logging should help some - but is it enough ? My LuaUPNP process in Vera takes up 185% MEM. Can this be brought down by adding swap to USB - or is this a dead end ?

IF I choose to buy the Vera 3 - can the Vera Lite be used to extend the network, and is there any advantages of having two in the same house ?

Thanks,
Martin

Adding a usb drive and enable usb logging will free up a lot of memory, my advice would be to do that first.

As @mfp advises, USB logging is well worth doing. Actually I think it is pretty much essential on a VeraLite.

The next move is to see whether you have any plugins that you really don’t need. I have also found that uninstalling plugins does not remove the files so it is worth a little clean-up of /etc/cmh-ludl/.

I did make the switch from VeraLite to Vera3 to get out of the memory bind. I shall redeploy the Lite to my workshop in due course. I’m not sure that is better than one big controller but, for the next few months, it is what we have to work with.

All the previous advice is recommended. I was having similar issues as you.

About once a week when I would come home the system wouldn’t initiate the “come home” actions. After I did some research, Vera was slowly running out of memory due to memory leaks and such and when I would come home, all the various scripts and actions would cause Vera to run out of memory and restart. So I don’t think Vera is restarting when the scenes don’t fire, but rather Vera is restarting when a scene fires and causes it to run out of memory.

I got rid of several little used plugins and Vera has far less memory issues now.

You may also want to check your NetworkMonitor.log.

It uses a series of http calls to validate that the LuaUPnP process is running. If these calls fail, or fail to return in sufficient time, then it will attempt a number of retries… before REBOOTING the OS. This is not a mere restart of the LuaUPnP process, but an entire OS-Level reboot (as measured by [tt]uptime[/tt] et-al)

Unfortunately, I’ve seen bugs in that logic where once it fails “once”, it’ll fail the subsequent times and perform the OS Level reboot.

A healthy set of these look like this:

10 01/10/14 15:55:55.598 FileUtils::ReadURL starting http://127.0.0.1/port_49451/data_request?id=lu_alive <0x2b802460> 10 01/10/14 15:55:55.608 FileUtils::ReadURL 0/resp:200 size 2 http://127.0.0.1/port_49451/data_request?id=lu_alive <0x2b802460> 10 01/10/14 15:55:55.609 FileUtils::ReadURL starting http://127.0.0.1/port_3480/data_request?id=lu_alive <0x2b802460> 10 01/10/14 15:55:55.621 FileUtils::ReadURL 0/resp:200 size 2 http://127.0.0.1/port_3480/data_request?id=lu_alive <0x2b802460>

In my case, I had a few critical scenes that would run for 1-5 seconds and if the NetworkMonitor process kicked in during that time, there was a high likelihood of Vera Rebooting… the scene wouldn’t complete, and that was the obvious sign this was going on.

For me, the solution was a combination of excluding failing Z-Wave nodes (battery stuff, mostly) along with some tuning of all the components (each Plugin) and lowering the log levels to something more normal.

… basically anything that would cause the LuaUPnP process, or the overall OS, to slow down and/or become non-responsive - even if only for short periods of time.

I spent a lot of quality time in the LuaUPnP.log file, as well as monitoring/mimicking the logic of NetworkMonitor to prove how faulty it was (all in r1.5.621/622)

[quote=“RexBeckett, post:2, topic:178773”]@akbooer’s EventWatcher will log your restarts if you set LogDirectory to a Vera folder. See: EventWatcher.

However, a common cause of random reboots is running out of free memory. Adding another plugin is likely to make this worse. I would check your memory situation before adding anything more to your system.[/quote]

Just wondering if I could use Prowl code in the startupLua to send an alert notification on every restart?

Before I play with the Luup Start Up (under Apps/ Dev), do you know it the code stored there is run last in the restart process - so the IP and network connection will be in place ?

luup.inet.wget("http://www.prowlapp.com/publicapi/add?apikey=<insertyourprowlapikeyhere>&application=Vera+System+Notification&event=Alert&description=The+Vera+Luup+Engine+Has+Been+Restarted&priority=-1") return true

Does the code look ok to go in there directly as it is above ?

Actually is this needed, I guess I could just use code to run a scene that does the same thing?

Before I play with the Luup Start Up (under Apps/ Dev), do you know it the code stored there is run last in the restart process - so the IP and network connection will be in place ?
It is run very early in the restart process so you may need to delay the execution of your code until everything has been initialised.
Does the code look ok to go in there directly as it is above ?
Do not include the [i]return[/i] or you'll kill any code that follows it.

Try:

function sendProwl() luup.inet.wget("http://www.prowlapp.com/publicapi/add?apikey=<insertyourprowlapikeyhere>&application=Vera+System+Notification&event=Alert&description=The+Vera+Luup+Engine+Has+Been+Restarted&priority=-1") end luup.call_delay("sendProwl", 60)

Thanks @rex - it works a treat… :slight_smile:

A last thing you could do (when you have a fast USB stick) is create a swapfile, which gives Vera a bit more memory to use. It saves me some daily reboots.

The steps from the Wiki also work on Vera3. [url=http://wiki.micasaverde.com/index.php/USB_swapfile_creation]http://wiki.micasaverde.com/index.php/USB_swapfile_creation[/url]

OK, so i added a prowl notification to the startup lua and last night between 9PM and 9AM my Vera 3 rebooted 24 times.

So armed with this info i just sat there eating my breakfast watching tail -f LuaUPnP.log until it crashed. Did not take long.

2014-01-12 11:07:24 - LuaUPnP Terminated with Exit Code: 245

2014-01-12 11:07:24 - LuaUPnP crash

01 2014-1-12 11:7:24 caught signal 11 <0x313d2680>

So then I font another thread on memory as it was mentioned above and this is my top output:

Mem: 85688K used, 41768K free, 0K shrd, 0K buff, 38272K cached
CPU: 22% usr 10% sys 0% nic 62% idle 0% io 0% irq 4% sirq
Load average: 0.46 0.59 0.74 5/113 9349
PID PPID USER STAT VSZ %MEM %CPU COMMAND
28765 2298 root S 131m 105% 22% /usr/bin/LuaUPnP
2084 1 root S 748 1% 4% /usr/bin/luci-bwc -d
6347 2433 root S 7028 6% 0% /usr/bin/NetworkMonitor

So I am guessing it is a RAM issue.

I will dig out a USB hub and add a USB drive or 2 more for Vera Logging and Possible a page file. The 2 USB ports are used with an RFXTrx433 an a USB stick for DataMine currently.

I removed all the plugins that i did not need but made no difference to LuaUPnP %MEM

NetorkMonitor.log is empty.

Thanks all. Will give these bits a try!

Hi @andyvirus

Wow that is a lot !! During that period I have has no restarts.

It looks like you have loads of free mem and your use of virtual memory for the LuaUPnP process is no where near as bad as mine (199%)

CPU however seems high just on that process, mine is never more that 11% at peak.

Using the EventWatcher plugin to graph for the RAM and CPU I cinfirms to me how the resources available to Vera fluctuate so much (I bounce from 15mb of RAM free to 3mb - as you may have seen in that memory utilisation thread)

I do wonder if you have some badly behaving code still stored somewhere, maybe remnants from testing a plugin or something perhaps?

Also, have you logged a call with MCV for them to look at it?

Hi @andyvirus,

That is not the normal picture of a major memory issue as @parkerc has observed. Do you have any scenes that could run for long periods and/or use luup.sleep(…) in them?

I have not logged a call yet but I will. Its only been this morning that i have realised just how many restarts my Vera 3 is doing. I was going to get the Vera logs logging to Usb to see if that brought it under 100%.

It is possible that remnants of old plugins are there as I do get PLEG errors in my log all the time though PLEG works fine. I also tried to uninstall a foscam camera plugin which it said it did but the cameras remain in devices and i can’t remove them. It could all be connected but I am trying one thing at a time.

Still waiting for micasaverde to come back to me about my Horstmann ASR-ZW2 loosing its capabilities which stops my heating and/or hot water from switching to HeatOn. Once i notice its simply a case of hitting configure now and they come back but by that point the house is cold or i am having a very cold shower. That i think now may be directly related to the rebooting. But 5 days later and I am still waiting. Will open another call with them now though. This has been going on for as long as i have used Vera, just not this frequently.

On the PLEG thing i get this all the time. It started happening after an upgrade from I think 4.1 to 4.2 or 3. Using 5.5 now and PLEG is self is fine, just have stuff like this in the log all the time:

01 01/12/14 11:06:53.020 luvd_get_info /etc/cmh-ludl/S_ProgramLogicTS.xml.lzo doesn’t exist <0x30fd2680>
01 01/12/14 11:06:53.030 luvd_get_info /etc/cmh-ludl/S_ProgramLogicC.xml.lzo doesn’t exist <0x313d2680>
02 01/12/14 11:06:53.103 LOG_CHECK_MEMORY_LEAK pMem start 0x22c7000 now 0x24ef000 last 0x252c000 leaked 2260992 <0x2bd91680>
02 01/12/14 11:06:53.710 ZW_Send_Data node 23 NO ROUTE (nil) <0x2bf91680>
06 01/12/14 11:06:53.850 Device_Variable::m_szValue_set device: 300 service: urn:upnp-org:serviceId:TemperatureSensor1 variable: CurrentTemperature was: 23 now: 23 #hooks: 1 upnp: 0 v:0x10664c8/NONE duplicate:1 <0x2bd91680>
01 01/12/14 11:06:54.131 luvd_get_info /etc/cmh-ludl/S_ProgramLogicTS.xml.lzo doesn’t exist <0x313d2680>
01 01/12/14 11:06:54.837 luvd_get_info /etc/cmh-ludl/S_ProgramLogicTS.xml.lzo doesn’t exist <0x30fd2680>
01 01/12/14 11:06:54.842 luvd_get_info /etc/cmh-ludl/S_ProgramLogicC.xml.lzo doesn’t exist <0x313d2680>
01 01/12/14 11:06:54.847 luvd_get_info /etc/cmh-ludl/S_ProgramLogicTS.xml.lzo doesn’t exist <0x30fd2680>
01 01/12/14 11:06:55.242 luvd_get_info /etc/cmh-ludl/S_ProgramLogicC.xml.lzo doesn’t exist <0x30fd2680>
02 01/12/14 11:06:55.853 ZW_Send_Data node 23 NO ROUTE (nil) <0x2bf91680>
01 01/12/14 11:06:55.948 luvd_get_info /etc/cmh-ludl/S_ProgramLogicC.xml.lzo doesn’t exist <0x30fd2680>
04 01/12/14 11:06:55.989 <0x2bd91680>
50 01/12/14 11:06:57.822 luup_log:10: RFXtrx: Received message: 0D 59 01 61 25 00 06 00 65 00 00 00 00 49 <0x315d2680>
06 01/12/14 11:06:57.823 Device_Variable::m_szValue_set device: 10 service: urn:rfxcom-com:serviceId:rfxtrx1 variable: LastReceivedMsg was: 0A 52 01 60 B4 01 00 CC 28 02 59 now: 0D 59 01 61 25 00 06 00 65 00 00 00 00 49 #hooks: 0 upnp: 0 v:(nil)/NONE duplicate:0 <0x315d2680>
02 01/12/14 11:06:57.825 luup_log:10: RFXtrx: Decoding not yet implemented for message 0D 59 01 61 25 00 06 00 65 00 00 00 00 49 <0x315d2680>
01 01/12/14 11:07:01.401 luvd_get_info /etc/cmh-ludl/S_ProgramLogicTS.xml.lzo doesn’t exist <0x30fd2680>

By far the most activity on my vera is PLEG and PLTS as i have quite a few PLTS and 3 PLEG instances. Both core functionality to my setup now. I should go and see if there is a way to backup the PLTS and most importantly the PLEG config and remove PLEG and re-install to see if this fixes it but i really dread having to set up PLEG again as its been months of tweaking and is quite huge now :slight_smile:

Redbull and Vera = Ramble, sorry.

I will go have a look now to see if i have any scenes that use luup.sleep. I did use that once apon a time before PLEG but thought i had deleted must vera native scenes now… Could be still there though…

Don’t worry about the log entries like:

01 01/12/14 11:06:53.020 luvd_get_info /etc/cmh-ludl/S_ProgramLogicTS.xml.lzo doesn’t exist <0x30fd2680>

They happen because the referenced files are encrypted and this confuses Vera but doesn’t stop it working. RTS is planning to remove the encryption in the next release.

Awesome. One less thing to worry about. Not that i was worried about them anyway too much. Good to know what is going on though. Thanks!

Ok added a USB hub and another USB drive for Vera logging and the vmem usage went up for LuaUPnP but overall Mem: usage reduced.

Mem: 68480K used, 58976K free, 0K shrd, 1176K buff, 20272K cached
CPU: 15% usr 15% sys 0% nic 61% idle 1% io 0% irq 5% sirq
Load average: 0.74 0.88 0.60 1/125 11453
PID PPID USER STAT VSZ %MEM %CPU COMMAND
2438 2387 root S 143m 116% 18% /usr/bin/LuaUPnP

Next to look at paging file on another USB drive…

Could the over use of the 'NOW" variable be causing high vmdm usage?

I do use “now” quite a lot

Is there a better way to say:

(AnyMotion; NOW < 18:00:00)

Basically I want to say if there has been any motion in the last 18 hours, do or don’t do something.

I use this same in each room for each PIR. Anymotion is just a condition which all my PIRs when triggered updates.

I would like to use some other trigger than Now as that is every 60 seconds. I don’t want to use another schedule as it still could be evaluating unnecessarily every x or it could make the trigger way to slow if i choose to long a schedule.

If you have a large number of Conditions being evaluated every minute it could put an excessive load on Vera. Depending on what logic you are implementing, it is sometimes possible to avoid the use of Now by using Trigger/Condition timestamps:

NoMotion Not Motion
AnyMotion Motion and (NoMotion; Motion < 1:00:00)

Where you are looking for a long time period, it would be much more efficient to start a Self-ReTriggered timer for the period rather than evaluate the same condition hundreds of times.