Stabilizing the vera: Random reboots, memory leaks and logging.

I have been observing the behavior and stability of my vera system for a little while using the sysmon plugin. I had observed two things:

  1. Memory seem to leak between reboots or even to a lesser degree between a luup restart: The available memory keeps on decreasing up to a point where it seems to stabilise. In my case from ~90mb to 60mb and
  2. then after a few hours I would get a restart, either a cmh, or a vera restart. At each restart the memory would recover to the 85-90mb range.

I have opened a ticket with vera but the time between responses from support is too long for them to be able to capture the log event even if I have the time. Today I got a response indicating that I seem to be logging a lot to the server which could cause instability. I was suggested to log to a USB drive. The problem is that I am already doing that and have been since I replace my vera lite with a vera edge.

I had the goofy idea to decrease the log content by unchecking “Show polling activity” and “Show individual jobs” from the Logs settings. The result so far is that I am no longer observing what I thought was a memory leak. My available memory which would decrease by 10mb in an afternoon has now been hovering around 85mb.
It is too soon to know whether it eliminates or decrease the frequency of restarts which I had every few days but I am finding odd that the logs apparently are stored in the RAM while they are written to USB and archived to the server at the same time.

You say you are storing logs on a USB device. Have you confirmed this? You’re sure the USB is mounted?

Did you uncheck “Archive old logs on server”? If not, they are uploaded to the server and deleted from the USB.

You didn’t turn on Verbose logging, did you? That shouldn’t be on. If it is, you will have the problems you describe.

Have you tried telephoning support. It has proven, unsurprisingly, to be a faster option than the others.

Yes I checked all of the above. The drive shows to be mounted and USB logging enabled. (I can actually browse the USB drive through SSH. and the LED on the drive blinks on activity)
I did not turn verbose logging.
After one day I still see the memory decrease over time but at a much slower rate than before. I will see in a couple of days if I am still getting the reboots.
I think many are reporting random rebooting of the vera. I thought it to be fairly common but I am trying to eliminate them.
I have not tried phoning support. May try this next.

[quote=“anhman, post:1, topic:188071”]My available memory which would decrease by 10mb in an afternoon has now been hovering around 85mb.
It is too soon to know whether it eliminates or decrease the frequency of restarts which I had every few days but I am finding odd that the logs apparently are stored in the RAM while they are written to USB and archived to the server at the same time.[/quote]

Have you determined if the changes you made in fact made a difference after long-term observation? Thanks.

This seems to be a really common issue at the moment

Support have just started looking into my issues, I’ll post anything they suggest/find.

[quote=“anhman”]I have been observing the behavior and stability of my vera system for a little while using the sysmon plugin. I had observed two things:

  1. Memory seem to leak between reboots or even to a lesser degree between a luup restart: The available memory keeps on decreasing up to a point where it seems to stabilise. In my case from ~90mb to 60mb and
  2. then after a few hours I would get a restart, either a cmh, or a vera restart. At each restart the memory would recover to the 85-90mb range.

I have opened a ticket with vera but the time between responses from support is too long for them to be able to capture the log event even if I have the time. Today I got a response indicating that I seem to be logging a lot to the server which could cause instability. I was suggested to log to a USB drive. The problem is that I am already doing that and have been since I replace my vera lite with a vera edge.

I had the goofy idea to decrease the log content by unchecking “Show polling activity” and “Show individual jobs” from the Logs settings. The result so far is that I am no longer observing what I thought was a memory leak. My available memory which would decrease by 10mb in an afternoon has now been hovering around 85mb.
It is too soon to know whether it eliminates or decrease the frequency of restarts which I had every few days but I am finding odd that the logs apparently are stored in the RAM while they are written to USB and archived to the server at the same time.[/quote]
I found that NOT using a mounted USB drive greatly reduced the frequency of reboots. I tried finding the thread about this subject, but it seemed that others also had this happen.

If you do NOT use a USB thumb drive, than the log files use up memory, i.e. it’s a big memory leak.
The more you put in the log, the faster the leak!

This is the thread I was initially thinking about: Memory impact of USB logging

In that thread there were several people that noticed improvement by not mounting the USB. I had similar results to SirMeili, and turned off USB logging. In addition I stopped logging anything that wasn’t immediately needed. I’m with anhman, reduce the information in the logs to only what is needed. I believe un-checking the boxes on the GUI has the same affect as modifying the config (/etc/cmh/cmh.conf) More information can be found here. I found that this was a very effective way to reduce the leak and help stabilize the system.

As a side note, in perusing this I evaluated the idea of rotating the logs more frequently. This had similar results because you are just discarding old logs. You can do this by running “/usr/bin/Rotate_Logs.sh 1”. But it should be noted Vera has a bug in the Rotate_Logs.sh script currently - this causes the logs to only keep the last 1 file because the script renames to archiveFile=“${logFile}_1” instead of archiveFile=“${logFile}.1” as the rest of the script expects. I reported this to Vera, but it seemed to be a low priority bug at this time.

Yes, using a USB uses some memory … but that is a FIXED amount and is not that significant.
The peak usage form LOGS using memory is larger than this FIXED amount.
So memory utilization looks better when you reboot or rotate logs, but it looks worse right before LUAUPnp reloads!

If this alone causes you problems, then you already are way overloaded!

Yes, you can cut back on logs, and then you will not have any idea why you are crashing all the time.

A lot of people have added monitoring plugin on their Vera, when there Vera has become unstable because of the lack of memory … guess what … it gets worse …

You need to pursue strategies to REDUCE memory … and that means cutting back on the total count of Plugin devices. Or buy a Vera with more memory.

A Vera3/Edge has twice as much memory as a Vera Lite, a Vera Plus has twice as much memory as a Vera 3/Edge

Which Config use less memory and space?

A- 1 scene with 5 triggers
B- 5 scenes with 1 trigger

[quote=“michelhamelin, post:10, topic:188071”]Which Config use less memory and space?

A- 1 scene with 5 triggers
B- 5 scenes with 1 trigger[/quote]

In terms of memory, I think you would be hard put to tell the difference. In terms of user_data, the latter requires a bit more code and a bit more processing. But, really, if you’re re-writing your logic at this level to optimize Vera’s stability, then I think you’re probably going down the wrong path.

Bottom line: I really don’t think it matters.

[quote=“michelhamelin, post:10, topic:188071”]Which Config use less memory and space?

A- 1 scene with 5 triggers
B- 5 scenes with 1 trigger[/quote]I believe that the 5 scenes would use ever so slightly more memory, but I don’t think it would matter much until you extrapolated it out to a very large number of scenes. There are only 5 triggers for Vera to watch in either case. The scenes are just groupings/scripts to contain the triggers. But you would have a larger user_data.json with 5 scenes.

A. Scene 1
Trigger1
Trigger2
Trigger3
Trigger4
Trigger5

B. Scene1
Trigger1

Scene2
Trigger2

Scene3
Trigger3

Scene4
Trigger4

Scene5
Trigger5

Put more triggers in the scene or add some LUUP and the memory consumption grows further. But, like @akbooer, I don’t think it matters.

Hi all,

I also had strange stability issue with my Vera Lite I wanted to share with you.

A few month ago, thanks to Datamine2 notification, I noticed that my Vera was constantly restarting… Reboot rate was a few reboot per hours. When I say reboot I mean Lua crash.
Like everybody I first suspected a memory leak and removed a few pluggin and cleaned as most as I can do.

The rebbot rate goes down to exactly 1h10. Vera was rebooting exactly every hours and ten minutes.
I didn’t had the time to investigate more.

A few weeks later, one of my Fibar Smoke Sensor (Zwave Plus version) begin to randomly beep like hell. At least every 10 minutes… Sometimes 2 times within five minutes.
Replaced the batteries didn’t help ; reboot the Vera neither ; reconfigure node was working but didn’t help…

Then I decide to exclude and reinclude the device but then I saw that wake-up time of this device is 4200 seconds… Which is 1h10… So I excluded it but didn’t reinclude it. After a few days I can now confirm that my Vera didn’t crashed since I removed the Fibaro smoke sensore V2.

Since I dind’t had the time to reinclude it. Hope to be able to do it before next week.
Will keep you informed.

Maybe those informations could be usefull to someone,

Thomasss,

There may be something to this. I have lua restarts every hour or two like yourself, removing every plugin made no difference, still had crashes.

So its something on the wave network or the software stack that is causing some sort of holdoff that incurs delays… eventually crashing lua. It makes sense its something regarding Vera’s error handling of a zwave device, but its likely different for all of us. One thing I will try is removing my trane thermostats… they always give me trouble and have really long poll turn around times of like 10 seconds or so.

I probably should just nuke my entire setup, but given most of my devices are switches i can probably remove everything else 1 by 1 first and maybe have results without blowing everything up. At least that way I can find the issue.

x

Anyone know what to look for in the LuaUPnP log to signify the start of a reboot? I inserted some code so that i would get a text message when the Vera reboots (and also a log entry) and i am trying to see where the reboot happens in the log to see if it is somehting i can fix. the problem is that it looks like everything is running as normal just before my log entry shows up, so i assume there is a bunch of entries prior to my call that are part of startup. anyone know what i should be looking for?

well, i went back through the logs a bit and finally got to the point where i think the vera crashed. it shows:

2016-03-14 18:50:11 - LuaUPnP Terminated with Exit Code: 143

Anyone have a list of exit codes?

143 usually means the application caught a SIGTERM signal, meaning the process was killed.

Thanks akbooer, i will see if i can match the LuaUPnP log to the messages log to see if there is a culprit.

A segv is typically caused by two problems:

  1. A BUG in VERA’s core software … I have seen a few, but they are not common.
  2. Running out of memory, In this case memory returns the address of 0, software uses it and barfs.
    This is pretty common on a memory stressed Vera (In particular a Lite, since they are the Lightest on memory).

[quote=“RichardTSchaefer, post:19, topic:188071”]A segv is typically caused by two problems:

  1. A BUG in VERA’s core software … I have seen a few, but they are not common.
  2. Running out of memory, In this case memory returns the address of 0, software uses it and barfs.
    This is pretty common on a memory stressed Vera (In particular a Lite, since they are the Lightest on memory).[/quote]

I have been montoring my VeraLite for a few days now trying to determine if it is a memory issue. using top, i see the LuaUPnP process alwasy at 115% or higher, but there is always free memory in the box:

em: 59536K used, 2944K free, 0K shrd, 11188K buff, 19016K cached CPU: 6% usr 3% sys 0% nic 89% idle 0% io 0% irq 0% sirq Load average: 0.21 0.20 0.22 2/84 26301