Frequent restarts

Hi all,

My Vera Plus (or rather LUUP engine?) restarts frequently, many times every day. This is getting a major issue, as it effectively causes scenes to fail running, miss triggers or fail to run delayed sequences in scenes.
The last things I see in the log before the reboot is the same three lines:

01 12/11/17 16:06:52.355 e[31;1mUserData::WriteUserData saved–before move File Size: 137362 save size 137362e[0m <0x77058520>
02 12/11/17 16:06:52.355 e[33;1mUserData::TempLogFileSystemFailure start 0e[0m <0x77058520>
02 12/11/17 16:06:52.380 e[33;1mUserData::TempLogFileSystemFailure 6054 res:1

Then it seems to attempt to restart, fails, retries and succeeds. Log entry for first attempt:

/etc/cmh/wan_failover:
-rw-r–r-- 1 root root 44 Aug 14 2015 check_internet.hosts
e[0m <0x77058520>
02 12/11/17 16:18:52.467 e[33;1mUserData::TempLogFileSystemFailure start 0e[0m <0x77058520>
02 12/11/17 16:18:52.494 e[33;1mUserData::TempLogFileSystemFailure 5965 res:1
-rw-r–r-- 1 root root 33 Oct 11 08:47 /etc/cmh/HW_Key

I run the DataMine Plugin that logs to USB. Vera logging to USB is not enabled.

Any suggestions as to how I could fix this issue is greatly appreciated!

By chance are the restarts timed with a scene being triggered?

I have the issue described above when it involves recording involving two or more cameras with a scene.

Vera tells me it is a known bug they hope to fix soon…lol.

It would not surprise me if this issue occurs with other scene activations outside of camera recording.

Good question, answer is that it is hard to tell. See below.

Did your log have the same error messages?
02 12/11/17 16:12:52.373 e[33;1mUserData::TempLogFileSystemFailure start 0e[0m <0x77058520>
02 12/11/17 16:12:52.413 e[33;1mUserData::TempLogFileSystemFailure 6054 res:1

I see three scenes that are triggered all the time. They really shouldn’t be, as on of them is in fact deactivated, the other two has triggers that shouldn’t act like this. They have one thing in common, and that is that the Heliotrope plugin is a trigger (sun goes below/sun goes above). They do run, but returns false immediately, so they don’t do anything most of the time. And they run all the time, the restarts happens lot less frequently than they start.

I deleted the deactivated scene, and removed the Heliotrope trigger from the other. Let’s see if that changes anything.

I also find some strange things in the log; seems like two LightSensors are behaving weird. Log is full of these entries:
12/11/17 16:09:09.109 Device_Variable::m_szValue_set device: 338 service: urn:micasaverde-com:serviceId:LightSensor1 variable: e[35;1mCurrentLevele[0m was: 50 now: 50 #hooks: 0 upnp: 0 skip: 0 v:0xf951e0/NONE duplicate:1 <0x76e58520>
06 12/11/17 16:09:09.763 Device_Variable::m_szValue_set device: 338 service: urn:micasaverde-com:serviceId:LightSensor1 variable: e[35;1mCurrentLevele[0m was: 50 now: 50 #hooks: 0 upnp: 0 skip: 0 v:0xf951e0/NONE duplicate:1 <0x76e58520>
06 12/11/17 16:09:09.827 Device_Variable::m_szValue_set device: 447 service: urn:micasaverde-com:serviceId:LightSensor1 variable: e[35;1mCurrentLevele[0m was: 39 now: 39 #hooks: 0 upnp: 0 skip: 0 v:0xf951e0/NONE duplicate:1 <0x76e58520>
06 12/11/17 16:09:09.907 Device_Variable::m_szValue_set device: 447 service: urn:micasaverde-com:serviceId:LightSensor1 variable: e[35;1mCurrentLevele[0m was: 39 now: 39 #hooks: 0 upnp: 0 skip: 0 v:0xf951e0/NONE duplicate:1 <0x76e58520>
06 12/11/17 16:09:11.263 Device_Variable::m_szValue_set device: 338 service: urn:micasaverde-com:serviceId:LightSensor1 variable: e[35;1mCurrentLevele[0m was: 50 now: 50 #hooks: 0 upnp: 0 skip: 0 v:0xf951e0/NONE duplicate:1 <0x76e58520>
06 12/11/17 16:09:11.919 Device_Variable::m_szValue_set device: 338 service: urn:micasaverde-com:serviceId:LightSensor1 variable: e[35;1mCurrentLevele[0m was: 50 now: 50 #hooks: 0 upnp: 0 skip: 0 v:0xf951e0/NONE duplicate:1 <0x76e58520>
06 12/11/17 16:09:12.003 Device_Variable::m_szValue_set device: 447 service: urn:micasaverde-com:serviceId:LightSensor1 variable: e[35;1mCurrentLevele[0m was: 39 now: 39 #hooks: 0 upnp: 0 skip: 0 v:0xf951e0/NONE duplicate:1 <0x76e58520>
06 12/11/17 16:09:12.039 Device_Variable::m_szValue_set device: 447 service: urn:micasaverde-com:serviceId:LightSensor1 variable: e[35;1mCurrentLevele[0m was: 39 now: 39 #hooks: 0 upnp: 0 skip: 0 v:0xf951e0/NONE duplicate:1 <0x76e58520>
06 12/11/17 16:09:13.459 Device_Variable::m_szValue_set device: 338 service: urn:micasaverde-com:serviceId:LightSensor1 variable: e[35;1mCurrentLevele[0m was: 50 now: 50 #hooks: 0 upnp: 0 skip: 0 v:0xf951e0/NONE duplicate:1 <0x76e58520>
06 12/11/17 16:09:14.101 Device_Variable::m_szValue_set device: 338 service: urn:micasaverde-com:serviceId:LightSensor1 variable: e[35;1mCurrentLevele[0m was: 50 now: 50 #hooks: 0 upnp: 0 skip: 0 v:0xf951e0/NONE duplicate:1 <0x76e58520>
06 12/11/17 16:09:14.163 Device_Variable::m_szValue_set device: 447 service: urn:micasaverde-com:serviceId:LightSensor1 variable: e[35;1mCurrentLevele[0m was: 39 now: 39 #hooks: 0 upnp: 0 skip: 0 v:0xf951e0/NONE duplicate:1 <0x76e58520>
06 12/11/17 16:09:14.223 Device_Variable::m_szValue_set device: 447 service: urn:micasaverde-com:serviceId:LightSensor1 variable: e[35;1mCurrentLevele[0m was: 39 now: 39 #hooks: 0 upnp: 0 skip: 0 v:0xf951e0/NONE duplicate:1 <0x76e58520>

Watching. I’ve consistently experienced the same behavior since initial installation and configuration of my Vera Plus a couple months ago. So far tech support hasn’t been able to help (to be fair, I’ve not gone in-depth with them yet and they’ve only attempted one thing which they explained as “restarted the relays”).

Inzax, I don’t have any cameras.

@Thorden. Not sure about the logs. I will have to take a look. I threw it in the lap of tech support and waited for the reply I got. I was up to my eyeballs in other projects at the time.

I’ve experienced frequent Luup Engine restarts off and on since I have had my VeraPlus for a few years now. I set a notification and logging script in my startup Lua so I could track it. It would occur at seemingly random times about every hour to every 10 minutes or so. Really made the system unreliable. The first time it happened tech support helped me identify a bad z-wave device that was “flooding” the network. After removing it all was stable for a long time. Then very recently having issues again. Tech support still trying to isolate the cause but I was able to identify another sensor “EZMultiPli Z-Wave Multi-Sensor” that I installed in Feb that was all of the sudden causing restarts again. Didn’t happen until I installed some new switches. Removed it a few day’s ago and all is stable again.

I’ve looked into alternative hubs/controllers and don’t like anything as much as the vera. I use some vera scenes but mostly AltUI’s workflows to do complex automation. When things are stable its the best controller out there for what I use it for.

I moved most of my scenes to PLEG and it seems to survive LUUP restarts quite well. Vera restarts about 2 to 10 times a day (at 2 pm and 7 pm just this afternoon).

Virtually all my scenes are via PLEG as well. I agree, they seem to survive the restarts generally quite well. However, my wife works from home and has an uncanny ability to trigger (or attempt to trigger) an HA action just as the controller is reloading. WAF is still relatively low for the system in general, and I’d really like to get to the bottom of the frequent restarts.

Actually I think that any activity on an idle Vera can trigger a restart.
Mine typically restarts when I do motion in the house after it’s been idle for awhile.

Thank you so much for your feedback!

I see in my logs now that restarts are at least every hour. Totally unacceptable. I have had Vera systems since Vera2 first came out in 2010, it has never been as bad as this before.
Will submit a support ticket to see what they can find out.

I know we all has a lot of time, effort and money invested in Vera, but seriously, are there alternatives that are more stable? I would consider a switch, a bit tired of fiddling around with endless trouble. I want a stable smart home, not something you need to invest hours and hours in to keep working. I can’t even sell my house the unstable, quite useless way it is now, so I have to set everything back to old fashioned before I sell it. That will cost a lot, take a lot of time, and reduce the value too. Seriously Vera, you need to get this thing working in a reliable and user friendly way, or it will never achieve broad market acceptance.

There many alternatives, all more stable and cheaper than vera but with a new learning curve which means you will have to invest time in it.
Home Assistant, OpenHab are the well known local ones and are open-source. Along the same line you can also find Domoticz and iobroker. You will need your own little computer (rasberry pi probably suffice) to host it. I personally use Virtual machines on a server.
I have been fighting issues with the vera myself and have moved a lot of my automation to openLuup. See the section on this forum.

I have not found any of the commercial solutions very attractive maybe except for Homeseer because they are all require cloud processing and are not very stable either (wink, ST)

Thanks, Rafale77, that was very helpful information!

I will set up OpenLuup immediately, looks promising. Lack of hardware is not an issue. :slight_smile:

My Vera Secure just experienced a major meltdown today. Non-stop reboots for 45 minutes. Unable to access any devices or do any troubleshooting. I was able to grab a small chunk of the logs between reboots. One line was very disconcerting:

01 12/12/17 18:43:04.293 e[31;1mFileUtils::WriteBufferIntoFile cannot write to /etc/cmh/user_data.json.lzo.new err No space left on devicee[0m <0x773fb520>

No space left on device?? The dashboard shows 62% used. Had to pull the battery and do a full restart. So far all seems ok, but still no idea why this event occurred today. Power users - any thoughts?
If it happens again I will send a support request. So far is is an isolated incident albeit totally unexplainable.

This is not unheard of… several threads found if you search for “no space left on device.”

Solutions include just waiting, or calling support. I did notice this one in particular…

[quote=“Sorin M., post:3, topic:197685”]Sercan,

Try to reach our Customer Care team and they will help you on the spot. Details in my signature.
This might happen when restoring on VeraSecure a backup from an old legacy unit.[/quote]

As far as restarts, my VeraPlus running 1.7.3232 has been pretty stable. Sure it does restart, usually about 1 restart in the early morning each day. Unfortunately, the Vera system doesn’t handle exceptions very well. I slowly built my system (over 100 devices now), write most of my own code, try to limit 3rd party apps, buy quality hardware, and just systematically make relatively small and controlled adjustments in logic. This has worked for me so far. I can ignore it for months at a time pretty easily.

The fortunate, or unfortunate, reality is that Vera still has more to offer than most systems out there. I have to say that their support is improving along with how much they pay attention to the forums.

Thanks for the responses. Overall I am VERY happy with my Vera Secure, however yesterdays incident really came out of the blue. I work very hard at making my system as lean as possible while still providing in depth information on exactly how the system is performing, what events it ‘sees’ and what actions it is taking. Its been 24 hours since the meltdown and all is good, but that does make me wonder if I should do a scheduled FULL RESTART every once in a while. I used to do this with an old router that would drop its connection if left running for days on end. A reboot made it much more reliable. There are several posts that detail the Lua code to do this, but I’ve never seen a need. Any opinions? - with pro’s and con’s if possible.

Well shortly after my last post where I stated all was well, Major Meltdown #2 occurred. Now even a cold reboot (power cord out, battery disconnected) will not revive my controller. Sent a Tech Support ticket followed by a call to customer service. Waiting on a reply. Vera Secure stuck in a reboot/restart loop. Today in the notification panel I see the following:

Luup : Failed to download all plugins. Will retry in 10 minutes.
Panel Manager[19] : Running Lua Startup

No new plug-ins in weeks, so maybe this is an update? Virtual Panel may be the culprit or may be an innocent bystander. Can’t tell since the unit never gets fully loaded in order to see what works and what does not. Dead in the water until I hear from Customer Service.

Hmm, I probably don’t have good news for you. I’ve lost 2 Edge’s during a year for similar reason. Fortunately both were under warranty.

In both cases first I encountered hang-ups where I had to unplug the device to make restart and have it working again. After the reboot I’ve seen error message “Can’t write user data” on the dashboard, time of the alert was the same as when I’ve lost connection with controller.
After two or three cases like that both controllers hung-up and went into the continuous rebooting loop. Contacted support, they organized a call with me and tried to get controllers working guiding me what to do and trying to get the access logging remotely to my computer. No luck.

In both cases the comment from support was that it seems that it probably can’t erase the overloaded data and thus can’t finish the reboot.

In the first case, the controller final broke was straight after I asked support for help with this “can’t write user data error” - I’ve got an e-mail with a comment that they removed some temp files, and just after this controller went into rebooting loop.
For second case it happened after some short power outage.
At first I was convinced that the final breakdown was due to the support intervention (like accidentally removing wrong files during the cleaning), after second case I’m not so sure anymore (I have to underline that in both cases support did excellent job to help me).

The first one was also my first Vera, so I can image that it could went less stable due to making the setup partially in a trial-and-error way.
For the second controller I re-created network from the scratch avoiding installation of unnecessary plugins. No major changes in the setup before breakdown, so I can’t point any suspected reason.
As I use it in some remote location in the country (say: vacation house), I’ve installed UPS to secure any possible power outages and plan to put it into a sealed box to prevent any influence from the surroundings (since it is in vacation house, temperatures inside can be very low during a winter and rise relatively quick if heating is turned on - sealed box put into insulated place should minimize influence of such changes).
But I have other electronics working under the same conditions (wireless router, Netatmo weather station and camera) and they work without issues.

I don’t know what I’ll do if the third controller will broke. Some people here use backup controllers (if first one breaks, you power-on the second and recreate your network from the backup file), but replacing two controllers during a year is not a good ratio, especially when the warranty will end for them. I need a stable solution as I need to keep the home running even if nobody is there for a month or so. I still consider Vera as one of the best options for that purpose, hope my issues are just a “black series”.

What is the trace of a Luup-engine reload in the log? Not sure if I am looking for the right stuff.
/tmp/log/cmh/LuaUPnP.log

Does Verbose logging need to be enabled to see it?

Thanks!

Here is my follow up report of the issue:
After emailing and leaving a message w Customer Service I began to poke around in Vera’s file structure. I’m no expert by any stretch, but I can certainly see when a particular folder is being filled up with ‘junk’ files. In this case it was ect/cmh/persist. Every system log event was being written to a one line file in that folder, so in a matter of minutes there were hundreds of 65k files. In examining them, I could correlate each file with one line of log data. I took a BIG RISK and decided to delete the oldest dozen or so files. Almost immediately a dozen more were written. Since it didn’t totally crash or stop rebooting I decided to erase the oldest 1000 files. After about 30 seconds of ‘thinking’ the reboot completed and the system started running normally. Wow! But also I started seeing more new files being written. Within a few minutes the available space was used up and the reboots started again.

Next I deleted ALL the junk files from the folder and once the controller was up and running, I did a full restore to one day prior. The issue continued, so I repeated the process with a restore from two days prior. Same problem. On the 3 try I restored from the first of December (13 days prior). ALL GOOD!! Turns out on Dec 7 something changed on my machine. Not sure what, but something. I was able to restore back to Dec 7 and the system has been stable ever since, and NO files written to ect/cmh/persist in 24 hours!

Tech support did call me back after all this took place. Since the system was now stable, they really couldn’t help me diagnose the issue, but did offer to clean up temporary files and folders and remove some unused plug ins. I give them credit for what they did, but I also feel like they really weren’t able to get to a root cause due to my actions. To be clear, I have no beefs with Customer service, other than a small language barrier issue. They responded promptly and were willing to help even with all I had done.

Bottom Line: I am still happy with my Vera and the system I have here. This issue clearly was not resolved after my first post even though it seemed to be. Only time will tell if it is now back to normal. I feel incredibly lucky that I did not brick my controller and the only clue I had as to where to look was this one line in the logs:

01	12/12/17 18:43:04.293	e[31;1mFileUtils::WriteBufferIntoFile cannot write to /etc/cmh/user_data.json.lzo.new err No space left on devicee[0m <0x773fb520>