Continuous reboots on Vera Lite

I all of a sudden got a problem with continuous reboots. I have removed unused plugins (I’m not using ERGY) and installed a USB hub and stick and enabled logging to it. Power cycle didn’t help. I have a modest amount of plugins and only one PLEG instance so far.

I would really appreciate some help on this. I can’t tell what’s going wrong from reading the log but hopefully it makes sense to someone else. I’ve pasted the last bit before for the LuaUPNP crash message. I’m happy to post more of the log if needed.

[code]02 01/18/14 19:50:31.103 ZW_Send_Data node 13 NO ROUTE (nil) <0x2b77f680>
04 01/18/14 19:50:31.242 LEAK this:73728 start:6901760 to 0x18bd000 <0x2b37f680>
01 01/18/14 19:50:33.003 Failed to get lock(0x107a84c) Lua: LuaInterface.cpp:1686 last used LuaInterface.cpp:1990 first used LuaInterface.cpp:1990 thread: 0x2d77f680 (>5857) handler 0x424934 bOkToFail 0 LEAK this:221184 start:7122944 to 0x18f3000 <0x2ac4c000>
02 01/18/14 19:50:33.005 Dumping 25 locks <0x2ac4c000>
02 01/18/14 19:50:33.005 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.005 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.006 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.006 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.006 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.006 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.006 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.007 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.007 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.007 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.007 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.008 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.008 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.008 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.008 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.009 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.009 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.009 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.009 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.010 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.010 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.010 OL: (0x107a84c) (>5857) Lua LuaInterface.cpp l:1990 time: DL7:49:32p (61 s) thread: 0x2d77f680 Rel: N Got: Y <0x2ac4c000>
02 01/18/14 19:50:33.010 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.011 OL: (0xa9ad9c) (>6216) ThreadedClass …/JobHandler/JobHandler.cpp l:404 time: 7:49:33p (60 s) thread: 0x2ac4c000 Rel: Y Got: Y <0x2ac4c000>
02 01/18/14 19:50:33.021 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.022 OL: (0x107a84c) (>6220) Lua LuaInterface.cpp l:1686 time: 7:49:33p (60 s) thread: 0x2ac4c000 Rel: N Got: N <0x2ac4c000>
02 01/18/14 19:50:33.022 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.022 OL: (0xa9ae0c) (>10254) Variable JobHandler_LuaUPnP.cpp l:6848 time: 1:00:00a (1390071033 s) thread: 0x2f57f680 Rel: Y Got: Y <0x2ac4c000>
02 01/18/14 19:50:33.023 finished check for exceptions <0x2ac4c000>
02 01/18/14 19:50:33.023 OL: (0xd21c6c) (>10305) ThreadedClass …/JobHandler/JobHandler.cpp l:404 time: 1:00:00a (1390071033 s) thread: 0x2b77f680 Rel: Y Got: Y <0x2ac4c000>
01 01/18/14 19:50:33.023 Deadlock problem. going to reload and quit <0x2ac4c000>
03 01/18/14 19:50:33.023 JobHandler_LuaUPnP::Reload: deadlock Critical 1 m_bCriticalOnly 0 dirty data 1 <0x2ac4c000>
02 01/18/14 19:50:33.922 luup_log:7: RFXtrx: Bad starting message; ignore byte 60 LEAK this:200704 start:7323648 to 0x1924000 <0x2f37f680>
06 01/18/14 19:50:33.923 Device_Variable::m_szValue_set device: 1 service: urn:micasaverde-com:serviceId:ZWaveNetwork1 variable: NetStatusID was: 1 now: 2 #hooks: 0 upnp: 0 v:0xdee098/NONE duplicate:0 <0x2ac4c000>
06 01/18/14 19:50:33.924 Device_Variable::m_szValue_set device: 1 service: urn:micasaverde-com:serviceId:ZWaveNetwork1 variable: NetStatusText was: OK now: Resetting ZWave Network #hooks: 0 upnp: 0 v:0xdee150/NONE duplicate:0 <0x2ac4c000>
01 01/18/14 19:50:40.358 Main WatchDogRoutine: blocked - terminating 1 LEAK this:-659456 start:6664192 to 0x1883000 <0x2e17f680>
02 01/18/14 19:50:40.359 Dumping 21 locks <0x2e17f680>
02 01/18/14 19:50:40.360 finished check for exceptions <0x2e17f680>
02 01/18/14 19:50:40.360 finished check for exceptions <0x2e17f680>
02 01/18/14 19:50:40.360 finished check for exceptions <0x2e17f680>
02 01/18/14 19:50:40.360 finished check for exceptions LEAK this:73728 start:6737920 to 0x1895000 <0x2e17f680>
02 01/18/14 19:50:40.381 finished check for exceptions <0x2e17f680>
02 01/18/14 19:50:40.381 finished check for exceptions <0x2e17f680>
02 01/18/14 19:50:40.381 finished check for exceptions <0x2e17f680>
02 01/18/14 19:50:40.382 finished check for exceptions <0x2e17f680>
02 01/18/14 19:50:40.382 finished check for exceptions <0x2e17f680>
02 01/18/14 19:50:40.382 finished check for exceptions <0x2e17f680>
02 01/18/14 19:50:40.382 finished check for exceptions <0x2e17f680>
02 01/18/14 19:50:40.383 finished check for exceptions <0x2e17f680>
02 01/18/14 19:50:40.383 finished check for exceptions <0x2e17f680>
02 01/18/14 19:50:40.383 finished check for exceptions <0x2e17f680>
02 01/18/14 19:50:40.383 finished check for exceptions <0x2e17f680>
02 01/18/14 19:50:40.384 finished check for exceptions <0x2e17f680>
02 01/18/14 19:50:40.384 finished check for exceptions <0x2e17f680>
02 01/18/14 19:50:40.384 finished check for exceptions <0x2e17f680>
02 01/18/14 19:50:40.385 finished check for exceptions <0x2e17f680>
02 01/18/14 19:50:40.385 OL: (0x107a84c) (>5857) Lua LuaInterface.cpp l:1990 time: DL
7:49:32p (68 s) thread: 0x2d77f680 Rel: N Got: Y <0x2e17f680>
02 01/18/14 19:50:40.385 finished check for exceptions <0x2e17f680>
02 01/18/14 19:50:40.385 OL: (0xa9ad9c) (>6216) ThreadedClass …/JobHandler/JobHandler.cpp l:404 time: 7:49:33p (67 s) thread: 0x2ac4c000 Rel: Y Got: Y <0x2e17f680>
02 01/18/14 19:50:40.386 finished check for exceptions <0x2e17f680>
02 01/18/14 19:50:40.386 OL: (0x107a84c) (>6220) Lua LuaInterface.cpp l:1686 time: 7:49:33p (67 s) thread: 0x2ac4c000 Rel: N Got: N <0x2e17f680>

2014-01-18 19:50:40 - LuaUPnP Terminated with Exit Code: 137

2014-01-18 19:50:40 - LuaUPnP crash[/code]

When Vera “self-terminates”, as it did here, it’s typically due to an event that’s [at least] 60 seconds prior to the termination (Exit Code: 137 (137-128==9, or SIGKILL)…

The earlier parts of the log would likely be needed for a meaningful eval of what contributed to the fail.

Thanks for your input, guessed. I’ve extracted the log between reboots and attached it.

First thing is turn off that Energy Plugin … it is notorious for destabilizing a system.

Yup. They [ERGY] used to have a habit of dumping a “lot” of your device information, far more than simply “energy”, into their servers so there are many (!) reasons not to run their stuff on a system.

… esp one that’s running as much as you are, there’s a lot going on given the number of plugins, and devices attached to these plugins, you have on your VL.

Also, it looks like Z-Wave nodes 7 & 11 (nodeId’s, not deviceId’s) are having contact issues, so you may want to see if you can improve that.

A few side-notes:
a) There are a bunch of [tt]GetSunriseSunset[/tt] errors at the very beginning
I’ve never seen these before, so they could be benign, but it may be worthwhile checking your Location settings in the Control Panel, to see if they’re valid.
b) The log should be edited to remove the PII associated with Google Calendar, which appears to be failing anyhow.
c) The RFX Plugin should be modified to use set-if-changed logic, to reduce log output.
There are others also, but the RFX is the largest generator of data in your logs.

I’ve re-attached your log file, with the most obvious bits of PII removed… just to keep things safe for you :wink:

If you do not already have a memory stick … you should install a memory stick and send the logs to it.
This will cause problems with the datamine plugin you are using. But I think you are doing way more than your Vera can handle.

It look so like his dataMine is on NAS so he should be good to go for USB logs without issue.

[quote=“guessed, post:5, topic:178944”]Yup. They [ERGY] used to have a habit of dumping a “lot” of your device information, far more than simply “energy”, into their servers so there are many (!) reasons not to run their stuff on a system.

… esp one that’s running as much as you are, there’s a lot going on given the number of plugins, and devices attached to these plugins, you have on your VL.

Also, it looks like Z-Wave nodes 7 & 11 (nodeId’s, not deviceId’s) are having contact issues, so you may want to see if you can improve that.

A few side-notes:
a) There are a bunch of [tt]GetSunriseSunset[/tt] errors at the very beginning
I’ve never seen these before, so they could be benign, but it may be worthwhile checking your Location settings in the Control Panel, to see if they’re valid.
b) The log should be edited to remove the PII associated with Google Calendar, which appears to be failing anyhow.
c) The RFX Plugin should be modified to use set-if-changed logic, to reduce log output.
There are others also, but the RFX is the largest generator of data in your logs.[/quote]
I started by removing the GCal plugin which actually seemed to have solved the problem . CPU load dropped a lot. I have had it installed for a long time without actually using it so I don’t know what has happened to it lately.

Thanks for removing GCal PII. Is there any other kind of info that generally should be removed before posting logs? I guess all properties of a plugin might be seen in clear text?

I did activate ERGY when I first got the VL but quickly uninstalled it when I read about the bad things it could do. After I got the reboot issue I did activate it/unistall it again just to be on the safe side it was gone. Are there still traces left of it in my system? Should I do a complete reset of my VL?

I am logging dataMine to NAS and when I got the reboot problem I installed a USB stick via a USB hub (sharing with RFXtrx) for the Vera logs.

I checked my location settings and they are correct. I changed it to a different city to test but I get the same error. Though, the time for sunrise/sunset seem to be found as they are written in the log. I believe my scenes that are dependent on sunset are working.

02	01/19/14 6:00:03.573	GetSunriseSunset error#1 sunrise: 1390031027/2014-01-18 08:43:47 sunset: 1390057604/2014-01-18 16:06:44 midnight: 1390086000/7855852 <0x2b193000>

Nodes 7 (Fibaro wall plug) and 11 (Aeon smart metering switch) are located in the garage so it makes sense that they could have problems with communication. Though, I can’t see that they ever fail to receive on/off command or report their consumption. Polling is set to default 60s. I do get error messages in the log when turning on/off. It seems strange as I don’t see in the GUI that any resends are needed.

02 01/19/14 5:45:14.592 ZWJob_SendData UPDATE MANUAL ROUTE 11=(nil) <0x2f633680> 02 01/19/14 5:45:14.596 UPDATE MANUAL ROUTE2 11=(nil) <0x2c033680> 02 01/19/14 5:45:14.596 ZW_Send_Data node 11 NO ROUTE (nil) <0x2c033680> 02 01/19/14 5:45:14.691 UPDATE MANUAL ROUTE2 11=(nil) <0x2c033680> 02 01/19/14 5:45:14.691 ZW_Send_Data node 11 NO ROUTE (nil) <0x2c033680>

What’s strange is that node 13, which is a Fibaro dimmer, is reporting the same thing that that device is located indoors close to VL and doesn’t seem to have a real communication issue either. These three devices are the only non-battery powered Z-wave devices I have and all seem to have the same problem which I think is suspicious.

Regarding the RFXtrx plugin. is there something I can do to reduce the log output or should I contact the plugin owner?

I hope I’m not close to overusing my VL as I’m just getting started. I am logging 32 channels using dataMine, most of them to monitor energy usage. There might be a better way to do this but for now I would like to use dataMine as it’s so quick and easy.

I really appreciate the help from you, guys. Hopefully I can stabalize my system a bit more.

I had frequent reboots until recently. There seems to be many causes for this.

Uninstalling plugins does not remove the files I have found. Once I cleared the uninstalled plugins files manually from the CLI my system seemed to stabilize. However, I cannot guarantee that this was indeed the fix for me.

If you go down this route, be careful. MCV may not support you, and you could in theory cause other issues.

No problem. I look for sensitive stuff like keys, email addresses, auth tokens, NAS mount passwords and just scrub them out.

Developer/Debug logs tend to have more of this stuff.

I did activate ERGY when I first got the VL but quickly uninstalled it when I read about the bad things it could do. After I got the reboot issue I did activate it/unistall it again just to be on the safe side it was gone. Are there still traces left of it in my system? Should I do a complete reset of my VL?
You should be fine. It's been some time since I've run Ergy, and it's removal (along with adding USB etc) are all part of a generic package of recommendations that are made to folks that are pushing the boundaries. Most of these date back to Vera1, or Vera2, when the resources were even more squished. We've even had people run with SWAP on the USB devices, but you know you're nearing the end of the rope when you start doing that ;)

… and with the combination of plugins, and the amount they’re collectively managing, you’re heading down the “pushing boundaries” path. :wink:

02	01/19/14 6:00:03.573	GetSunriseSunset error#1 sunrise: 1390031027/2014-01-18 08:43:47 sunset: 1390057604/2014-01-18 16:06:44 midnight: 1390086000/7855852 <0x2b193000>

Yeah, not sure what’s causing that.

Nodes 7 (Fibaro wall plug) and 11 (Aeon smart metering switch) are located in the garage so it makes sense that they could have problems with communication. Though, I can't see that they ever fail to receive on/off command or report their consumption. Polling is set to default 60s. I do get error messages in the log when turning on/off. It seems strange as I don't see in the GUI that any resends are needed.

I was looking more at these retry blocks:

02 01/18/14 19:49:31.104 ZW_Send_Data node 7 NO ROUTE (nil) <0x2b77f680> 02 01/18/14 19:49:33.423 ZW_Send_Data node 7 NO ROUTE (nil) <0x2b77f680> 02 01/18/14 19:49:35.744 ZW_Send_Data node 7 NO ROUTE (nil) __LEAK__ this:1998848 start:6111232 to 0x17fc000 <0x2b77f680> 02 01/18/14 19:49:38.085 ZW_Send_Data node 7 NO ROUTE (nil) <0x2b77f680>

02 01/18/14 19:50:01.113 ZW_Send_Data node 11 NO ROUTE (nil) LEAK this:778240 start:6991872 to 0x18d3000 <0x2b77f680>
02 01/18/14 19:50:03.242 ZW_Send_Data node 11 NO ROUTE (nil) LEAK this:204800 start:6762496 to 0x189b000 <0x2b77f680>
02 01/18/14 19:50:05.384 ZW_Send_Data node 11 NO ROUTE (nil) LEAK this:-266240 start:6496256 to 0x185a000 <0x2b77f680>

Where it’s trying to send stuff (2 seconds apart, 3x times typically). That’s where my comment came from, since these typically go through on the first call.

That said, it doesn’t look like you have a big/strong ZWave mesh, so that might be more common in smaller setups.

Regarding the RFXtrx plugin. is there something I can do to reduce the log output or should I contact the plugin owner?
Just let the developer know. More logs == more work, and it was one of the larger tuning steps I took when I was sorting out my system (I had a number of plugins that were logging a lot, even in normal mode.
I hope I'm not close to overusing my VL as I'm just getting started. I am logging 32 channels using dataMine, most of them to monitor energy usage. There might be a better way to do this but for now I would like to use dataMine as it's so quick and easy.
Well, you certainly have a lot of plugins running (RFX, Sonos, SystemMonitor, UPnP Agent, Homewave, dataMine, etc) and each running a lot of stuff. Your [i]SystemMonitor[/i] information seems to show a reasonable amount of headroom, but that's reporting data _just_ after startup, so you'd want to see that once it's been running for a while.

You’re definitely doing the right things to offset the costs (USB, NAS, etc), but these boxes are generally resource-constrained. I’ve started using resources from my Mac to do the book keeping stuff (the equivalent of DataMine), and am also looking into moving more of the overall logic out of Vera, time will tell with that one, and what I’m doing is definitely not for everyone :wink:

I really appreciate the help from you, guys. Hopefully I can stabalize my system a bit more.
You're welcome, it's fun debugging the more intricate systems that are having issues (when time permits)

Good to hear that I’ve not reached disaster just yet… :wink: Since yesterday I believe I only had one unintentional reboot at 02:01 am. Think I’ve seen something about nightly reboots in the forum?

These are the current stats:

Mem Available20256kilo bytesCPU Load (1 minute)0.1CPU Load (5 minute)0.14CPU Load (15 minute)0.15Last CMH Reboot04:19:56 Sun 19 Jan 2014Last Vera Restart16:29:58 Sun 19 Jan 2014Last Luup Restart02:01:24 Mon 20 Jan 2014

Regarding Ergy, is there something I should clean out manually from the file system?

You’re completely right that my Z-wave network is not that big yet. I have a bunch of switches and dimmers laying around that I should install. Though, the only device that is giving me a real problem right now is the Aeon multisensor. Before yesterday it was working pretty ok, reporting values as it should, but now I have just gotten a few reports. I moved Vera closer but same result. I should probably put in on USB power when I get the chance.

I would really like to do my own logging outside Vera but I don’t know where I should start. So if you have any hints what I should look at what comes of getting data out of Vera and where to put and process it would be really nice. In the long run I would like to program my own dashboard (webapp or iOS) showing energy consumption, weather etc.

Regarding Ergy, is there something I should clean out manually from the file system?
It's basically the same as other plugins, except it has a "special" place in the UI to setup/remove it. It's been a while since I've tried it, given it's problems, but it probably leaves behind the same type of files that other Plugins use in /etc/cmh-ludl (but, obv, I haven't checked that recently)

If that’s the case, you could manually remove them also. IIRC they’re pseudo-crypto’d, so they have a different file extension than the other LZO files in that directory, so they should be easy enough to find.

I would really like to do my own logging outside Vera but I don't know where I should start. So if you have any hints what I should look at what comes of getting data out of Vera and where to put and process it would be really nice. In the long run I would like to program my own dashboard (webapp or iOS) showing energy consumption, weather etc.
Well, it's over the top, but I have OpenHAB running on my Mac. I setup OpenHAB to "call" Vera on various URL's to fetch current temperature data, and then configured it to graph the various values (in an overlay graph)

It took me about a day to setup, but that’s mostly because I was learning about it, and was downloading the sources and building from scratch.

Anyhow, I now have it graphing the temperature data, and voice announcing the temperature changes (garage, upstairs, downstairs, outside) and all with the built in stuff (HTTP Binding, RRD Database for hour/day/weekly persistence, default HTTP UI for iPhones as well as the option for native apps).

I was going to extend it to doing the same for my energy data… but then decided I might go a different route, and try writing a Vera binding instead, since that’ll allow me to get access to everything that Vera can see/do. We’ll see how that goes, since I haven’t started it yet :wink:

Sounds most impressive. I will have a deeper look at OpenHAB when I get the chance, which probably won’t be anytime soon though with a 1,5 year old daughter and another one of its way… :wink: I’ll see if I can solve my data presentation in a easier way for now, maybe with a future version of EventViewer that looks quite promising.

I will have a look at my filesystem and see what can be cleaned out what comes to old, uninstalled plugins. Is it a good thing to sometimes reset Vera and restore data to start with a clean slate?

Regarding my reboots, I now only have one nightly reboot around 2am. From what I understand this is by design and part of Vera’s healing process?

[quote=“Freddan101, post:14, topic:178944”]I will have a look at my filesystem and see what can be cleaned out what comes to old, uninstalled plugins. Is it a good thing to sometimes reset Vera and restore data to start with a clean slate?

Regarding my reboots, I now only have one nightly reboot around 2am. From what I understand this is by design and part of Vera’s healing process?[/quote]

Starting clean is not a bad idea. It just a tradeoff on the amount of time to get all your settings, scenes, PLEG’s and other stuff back. There is indeed files in the cmh-ludl directory for ergy. It was called something like EchoEnergyManager or something like that. Remember, if you start messing with that directory, you have a very real chance of corrupting your Vera and Vera Support will likely not help much. You will probably have to restore factory default if something goes bad.

The vera should NOT be rebooting at 2:00AM every night. That indicates something else is going on when the heal starts. It should not heal every night either if you have a solid zwave network, but that is a problem I am struggling with right now as well.

[quote=“niharmehta”][quote=“Freddan101, post:14, topic:178944”]I will have a look at my filesystem and see what can be cleaned out what comes to old, uninstalled plugins. Is it a good thing to sometimes reset Vera and restore data to start with a clean slate?

Regarding my reboots, I now only have one nightly reboot around 2am. From what I understand this is by design and part of Vera’s healing process?[/quote]

Starting clean is not a bad idea. It just a tradeoff on the amount of time to get all your settings, scenes, PLEG’s and other stuff back. There is indeed files in the cmh-ludl directory for ergy. It was called something like EchoEnergyManager or something like that. Remember, if you start messing with that directory, you have a very real chance of corrupting your Vera and Vera Support will likely not help much. You will probably have to restore factory default if something goes bad.

The vera should NOT be rebooting at 2:00AM every night. That indicates something else is going on when the heal starts. It should not heal every night either if you have a solid zwave network, but that is a problem I am struggling with right now as well.[/quote]
Sorry, to be clear it’s just LUUP that’s restarting - not the whole Vera. But maybe that shouldn’t happen either?