Securing and stabilizing the Vera by taking it off the grid

Watch for a reload when the clocks change back from DST too!

These calls seem to be buried into the LuaUPnP program. I wish I could get the code to modify it… It’s very frustrating. Between this, the cumulative Event server calls, the poor command queue management and the nightly heals, these 4 “features” are killing this platform.

Same here. I got a luup reload at 0:00 local time.

Same here. Got the alert at 12:00:31am…

I really fail to understand the logic behind a monthly Luup Reload. So is it “Our program is so buggy that we have no idea how to fix it so let’s reboot it once a month!”??

Why not every other minute? My system seems reasonably stable now but why oh why would you implement a forced power cycle on an arbitrary schedule? What if I was doing something important right at that moment?

To the new owners… before you start trying to add more features, please fix the code!

Why not every other minute?
As we know, many Veras do restart often. As did mine, until I relieved them of a lot of plugin duty, by transferring them to openLuup 8)

To be fair; I would suggest a lot of plugins need a bit more work; probably including some of my own: https://github.com/a-lurker?tab=repositories
The new owners could work out which ones, by analysing the thousands of Veras hooked up to their servers.

The one other thing that it would simplify, which I learned when implementing openLuup, is repeating monthly timers. This is actually a tricky area because of the different month, and year, lengths. This might seem trivial, but it’s not. The same is absolutely true of the two DST changes a year, otherwise you have to cope with missing, or repeated, time intervals.

So there are excuses, but perhaps no real reasons.

Making progress:

There is a network monitor program which can cause havoc running in the background on the vera if one wants to isolate it from the outer world. Upon killing it, I started getting a full vera reboot every 10hours. I just found out why this morning. The log rotate script (/usr/bin/Rotate_Logs.sh), which is a cron job, includes code to either reload luup or reboot the vera if the network monitor fails. The reason for it is to be able to upload logs to the mios servers. These reload functions and network checks are active even if you have the “upload logs to the mios server” disabled in the vera. I also noticed that it copies logs from one folder to another before uploading which is absurd because one is a symbolic link of the other so it is actually overwriting itself. Way to increase NAND cell wear!
The 10hr interval was due to the time the size of the log was just right to do a rotate. I ended up commenting out a lot of the code to prevent Luup Reloads and Vera Reboots as I am convinced that the reload would not fix any problem. This is another example of “I don’t know where the bug is so let me just reboot” instead of looking for the bug. Instead of fixing anything, it is actually increasing data corruption probability and is likely going to get the vera into an infinite reload loop.

To address your concern about NAND wear. Everything that is stored in /tmp folder, is NOT stored on the onboard NAND, but in RAM.

To address your concern about NAND wear. Everything that is stored in /tmp folder, is NOT stored on the onboard NAND, but in RAM.[/quote]

Unless it is in the /tmp/log/cmh… which is where these logs are getting copied from and too. Yes the rest is in the RAM.

I think I am pretty close to where I can be on this topic. It seems like I am now able to almost run forever without Luup reloads due to internet connection drops, network errors or any memory issues. The one leftover source of Luup reload might be within the zwave network which I hope, the Luup engine code is not allowing.

Disabling “networkmonitor” aka “NM” by adding an exit 0 into it’s start script has made the vera much snappier and the modification to my LogRotate.sh script has eliminated the vera reloads. I have fixed the absurd mios system time resetting scripts and am now fully relying on the native openWRT scripts for it. The main source of available memory drop is also the logging mechanism… so the log rotate reliability’s key.

I also disabled the MiOSRestApi.sh which contains functions to make calls to the MIOS API server and now prevents it from checking for new firmware updates amongst other things and possibly could be preventing the zwave automated nightly heals. This is making me wonder about the monthly Luup reload too and the end of the month is coming…

Other mods: I updated the “serialapi_controller_static_ZM5304_US.hex” to the latest version (6.81) I obtained from Silabs though this does not seem to matter so much, it is now matching the firmware SDK as it seems vera did not update it as part of the firmware upgrade as the original file looks older and smaller. I have rewritten some of the scripts to make the internet led depend on successful ntp server checks rather than the networkmonitor which I disabled, and service to be linked to the Luup engine status.

[quote=“rafale77, post:111, topic:199140”]I think I am pretty close to where I can be on this topic. It seems like I am now able to almost run forever without Luup reloads due to internet connection drops, network errors or any memory issues. The one leftover source of Luup reload might be within the zwave network which I hope, the Luup engine code is not allowing.

Disabling “networkmonitor” aka “NM” by adding an exit 0 into it’s start script has made the vera much snappier and the modification to my LogRotate.sh script has eliminated the vera reloads. I have fixed the absurd mios system time resetting scripts and am now fully relying on the native openWRT scripts for it. The main source of available memory drop is also the logging mechanism… so the log rotate reliability’s key.

I also disabled the MiOSRestApi.sh which contains functions to make calls to the MIOS API server and now prevents it from checking for new firmware updates amongst other things and possibly could be preventing the zwave automated nightly heals. This is making me wonder about the monthly Luup reload too and the end of the month is coming…

Other mods: I updated the “serialapi_controller_static_ZM5304_US.hex” to the latest version (6.81) I obtained from Silabs though this does not seem to matter so much, it is now matching the firmware SDK as it seems vera did not update it as part of the firmware upgrade as the original file looks older and smaller. I have rewritten some of the scripts to make the internet led depend on successful ntp server checks rather than the networkmonitor which I disabled, and service to be linked to the Luup engine status.[/quote]

I hope you are in the process of creating some sort of tutorial for the rest of us to follow… Keep up the good work! watching this thread closely.

Lol, I am too. This is probably my favorite thread on this forum!

Sent from my VS995 using Tapatalk

[quote=“tomtcom, post:113, topic:199140”]Lol, I am too. This is probably my favorite thread on this forum!

Sent from my VS995 using Tapatalk[/quote]

I definitely will. I am even considering writing a script you can execute to do all of this in one shot. So far it?s been holding up. I have not had any spontaneous reload since but for the sake of testing did a few reloads manually. I will now let it be for a month and see if it ever reloads.
Available memory has been oscillating between 180MB and 203MB.
On a different note I just wish mcv would provide a LuaUPnP test program like they used to on UI5 for windows so that I could run some test on another platform.

Got the luup reload from 1 day 1s of the month which I haven?t found how to disable. The interesting thing I learned is the total time it takes to run each reboot and reload on my system. Knowing how many devices I have and my startup lua runs a number of functions to establish variable watches, my full reboot takes ~1m40s and the Luup reload takes 33s. My openLuup which runs a lot more devices, plugins and all my automation, reloads in less than 3s on a single threaded VM…

It appears also that somehow I have managed to disable the automated nightly heal. As ALTUI is reporting that my latest heal is 4 days old. This would be interesting as I have not yet figured out where or how this happened. I am wondering if it is really not happening or if it is the logging of that event that is no longer occurring. In any case, I started thinking about creating a script to hack the time on the vera, which obviously would mess the logs too, to prevent all these unneeded time dependant events.

I am sharing the summary of all the mods I did to the Vera Plus to stabilize it and essentially disconnect it (almost completely) from the mios server.
This was done on firmware 7.0.26.01 (1.7.3831) as the latest release had catastrophic issues with secure class devices on my system. In spite of the help I have gotten from hours on the phone with the mcv support team I had not been able to get my vera to be a reliable, set it up and forget it system until now. It took a lot of testing and onion peeling of the system to conclude that vera is trying to do too much with too little. openLuup was one major leap toward stability but the occasional vera crash still were annoying and required interventions.

With all these mods, my vera has not had any luup reload or vera reboots except for the 1st of the month at midnight.
I practically eliminated all the weird “device not detected” and data corruption issues.
I somehow may have also disabled the nightly heal, though I am trying to confirm this as I used to have very random heal events as reported by ALTUI which have not reproduced in a week.
All plugins not relying on the MIOS servers work. All scenes and Lua code, startup Lua all work. I am only disabling the internet connectivity checks and and the underlying services connecting to the mios servers. The App Store which uses regular http calls still works.

What no longer works:

-Firmware automated check with the mios server
-vera notifications (I am using pushover, if you use vera alert, you likely don’t care)
-vera mobile apps as they rely on a remote access relay (I am running everything off of Homewave and openLuup and have stopped using the slow buggy iOS app years ago)
-logging to mios server. All the logs will remain local

The one service I have not been able to disable and still preventing the vera from running completely without the internet is the event server to which the vera sends alerts and accumulate unless either one dumps the files and does a luup reload or allow the vera to connect to its mios event server. This is all in spite of CC agent great help and attempts. The call is unfortunately made within the LuaUPnP program which is the vera engine itself and is a C++ compiled binary.

So here are the mods:

  1. Unrelated to disconnection but purely for stability/reliabilty, I extrooted the vera
    see here: http://forum.micasaverde.com/index.php/topic,103140.0.html

The main reasons for doing it are:
a. High number of people, including myself reporting NAND flash failure due to excessive write cycles on a very limited amount of storage.
http://forum.micasaverde.com/index.php/topic,109476.0.html
b. Limited storage has caused havoc on firmware upgrades recently to the point of bricking vera units.

Unfortunately I have only been able to make this work on the VeraPlus. The Vera Edge and older Veras do not seem to want to mount the external drive on boot.
This is low risk for anyone to try since a failure would just make you boot from the original vera NAND flash

  1. Modified 6 files in the /usr/bin folder
    a. mios-services.sh Killed the service functions
    b. MIOSRestApi.sh Disabled all the MIOS server calls
    c. Start_networkmonitor.sh Disabled the network monitor which is a source of Luup reloads, uses resources and is practically useless. It checks for network and internet connectivity and reboots/reloads which is the last thing I wanted it to do. It also seems to slow down the vera
    d. sync.time.sh disabled. One of many time sync scripts. This one is redundant and obsolete
    e. Start_LuaUPnP.sh Very slight mods to control the service led on the unit. Since network monitor is disabled, this LED would be off even when the vera is up and running. I made it depend on the vera engine only.
    f. Rotate_Logs.sh. Removed a nonsensical reload and reboot script which would cause the vera to reboot upon server connection failure even if you have log uploading disabled. I also brute forced disabled the log uploading.

  2. Modified files in /etc/init.d

check_internet, tunnels_manager.sh, wan_failover, cmh-ra, wol, all disabled for obvious reasons.
mios_fix_time.sh, modified to eliminate a strange time reset to Jan1st 2000 but I guess I could disable this script altogether given the number of redundant time resetting scripts.

  1. Updated the zwave sdk API to 6.81 (in /etc/zwave) to match with the zwave chip firmware. I guess it is either a mistake of the firmware or mcv purposefully wanted to save 20KB of space by keeping a 2 year old version of the file.

I have attached a zip file to decompress and upload on the vera and run modvera.sh which will update all the files and reboot the vera in one shot.

Usage of the file: decompress and SCP into the vera. Copy the entire folder. No SSH into the vera and go into the folder and type the following:

chmod +x modvera.sh ./modvera.sh
At the end of the script you will get an ash error which is due to the fact that the script deleted itself so you can ignore it.

In combination with openLuup running all my automation and plugins, I now finally have a stable system!

Rafale, wow! Good description. Though for me this is a bridge too far…

I want to suggest Melih to read ypur post very carefully because the reasons you did all this, I also have them.

Especially the buggy gen5 secure class zwave support (nonsupport) is breaking up all I have had running fine until this release. It is the culprit of all issues I have on my system.

Keep up the good work!

Great write up and I really appreciate all the investigative work you have done. I have no idea what most of what you have written means as I’m not technical but I believe it keeps pressure on Vera to get things sorted…or I’ll be joining the crowd moving elsewhere unless we see something soon. Not the six months that keeps getting mentioned.

Ran the script on my test Vera Plus with no issues. I’ll keep an eye on things and report back.

Thanks for all of the good work!

Thank you for testing. I just installed this on a test vera running the latest 4001 7.0.27 firmware as well and not problems. There is actually no difference in os build between these two versions so the difference is only between the two programs. I will also write a script to revert it in case it is needed.