The continuing frustrations of Vera

[quote=“bucko, post:19, topic:186459”]My UI5 Vera 3 experience fI had big unstability issues when I used AutoVera. My Vera would restart several times a day and devices were very slow. I really needed Autovera, but once I uninstaqlled it, my Vera is stable with no memory issues. It restarts only during the nightly heal.

I waited a few months and installed it again. Within hours it went back to unstable operation.[/quote]

I liked AutoVera, I used it for a lot of things on my phones. I’m going to try it again today but I’m suspicious that it may actually be the culprit.

  1. For about the first 60 seconds after Vera restarts the CPU usage is in single digits, 2-5%. At about the 60 second mark usage jumps suddenly to 90% and then quickly tapers down to the nominal 30-35% level with cyclical excursions to 65-80% and back down to the 30-35% level.

  2. I’ve re-installed AutoVera with one android device linked so lets see what happens. This afternoon depending on the result i’ll gradually add more android devices to see if its activity linked. Right now its stable at 30-35%

I found that I was able to simulate most of the capabilities of the AutoVera plugin by having the (Android) Tasker program send HTTP requests to Vera.

Right now its stable at 30-35%

I hate to say … this is NOT a good stable number!

Hmmm. I wonder what the relative overhead and load for that is compared to AutoVera? Have you tried to assess that?

[quote=“RichardTSchaefer, post:24, topic:186459”]

Right now its stable at 30-35%

I hate to say … this is NOT a good stable number![/quote]

It strikes me that it isn’t either but there isn’t much more to get rid of before Vera becomes just an expensive ;light timer :slight_smile:

I currently have running:
PLEG - 5 instances
Day/Night Plug in
ImperiHome - 3 instances
MultiSwitch - 4 instances
Vera Alerts
MultiString - 1 instance
AutoVera

I’ve just removed the ImperiHome Plug In and restarted Vera. After about 2 minutes the nominal CPU usage is still in the 30-35% range so it doesn’t appear to have much impact.

[quote=“clippermiami, post:26, topic:186459”][quote=“RichardTSchaefer, post:24, topic:186459”]

Right now its stable at 30-35%

I hate to say … this is NOT a good stable number![/quote]

It strikes me that it isn’t either but there isn’t much more to get rid of before Vera becomes just an expensive ;light timer :slight_smile:

I currently have running:
PLEG - 5 instances
Day/Night Plug in
ImperiHome - 3 instances
MultiSwitch - 4 instances
Vera Alerts
MultiString - 1 instance
AutoVera

I’ve just removed the ImperiHome Plug In and restarted Vera. After about 2 minutes the nominal CPU usage is still in the 30-35% range so it doesn’t appear to have much impact.[/quote]

Nothing much has changed since the earlier report.

Given the fact that CPU utilization is in single digits for the first two minutes or so would lead me to two conclusions: 1) The remaining Plug Ins are not the core problem and 2) something kicks in at the two minute point and that something can only reasonable be functions in PLEG; functions I’m using incorrectly, too many functions, poor configuration practice, poor “coding” choices/practices … etc.

  • I have a fair number of self-retriggering timers for automatically turning off lights, etc. but I would think they pose no load until they actually get used i.e. the light is turned on. Most of them are simple, few with additional “blocking logic.”
  • There are a number of schedules to activate lights on/off by time of day, day/night, etc., which obviously have to be evaluated regularly
  • There are no uses of “NOW” in any timing function
  • There are a few interval timers the shortest of which is 90 seconds for the heartbeat function but of course there is only one of those
  • The bulk of the PLEGs is logic to evaluate sensors tripping, conditions that control whether lights go on e.g. is the ambient light level high enough the lights aren’t needed, etc.
  • There are some usage of “inet.wget” to update flags on the other Vera so it can track this Vera, etc., and some usages of switches on this Vera set by “inet.wget” from the other Vera for the same reason (Alarm is armed/disarmed, so-and-so is home, etc.)

The current configuration is:

PLEG - 5 instances
Day/Night Plug in
MultiSwitch - 4 instances
Vera Alerts
MultiString - 1 instance

Not much left to remove :slight_smile:

Did you say if you’ve tried a factory reset yet?

Stability is at like 99% for me, but that 1% can be very annoying when you depend on it. Will walk into a room, and when 99 out of a 100 the lights turn on instant; that 1 time it stays dark is all you remember.

Everything is still basic scenes for me, but I make creative use of the “Countdown timer”, and “Combination Switch” apps. Since I am still working on adding a VoxCommando server for always-on voice control with a XAP800 microphone mixer, most of the logic that would normally be done via PLEG will all be done there.

I have noticed an enourmous amount of Z-Wave traffic on my system, with a ton of errors occuring. Just haven’t been able to find the time to properly diagnose it yet.

Real-time log @ http:///cgi-bin/cmh/log.sh?Device=LuaUPnP

Example error that I constantly get once a minute:

01 03/31/15 12:14:38.508 FileUtils::ReadURL 0/resp:404 user: pass: size 28 https://vera-us-oem-event11.mios.com/list_alerts?hwkey=............ response: ERROR:Module name not found <0x2cad6680>

I really should just start over, but it is not an easy task with a lot of dismanteling required to access in-wall Aeon-Labs or Philio switch/dimmer modules that do not have a switch connected.

Great as I am writing this I just noticed the following:

02 03/31/15 12:16:30.390 UserData::TempLogFileSystemFailure start <0x2b8d6680> 02 03/31/15 12:16:30.418 UserData::TempLogFileSystemFailure 5047

Are there any guides available on how to debug all these little things, and return to flawless stability? Vera support is great when they take your issue on, but they never really explain in detail what they did. That in turn never benefits the next person, or allows me to do it myself the next time it happens.

Given the fact that CPU utilization is in single digits for the first two minutes or so would lead me to two conclusions: 1) The remaining Plug Ins are not the core problem and 2) something kicks in at the two minute point

Have you checked LuaUPnP.log to see if anything shows-up at the point CPU usage climbs? My Edge seems to spend most of its CPU cycles logging the fact that it doesn’t like its own file system. ::slight_smile:

^^ This.

I realize it’s a hassle, but if it was me I’d take a backup and then do a factory reset. If you’re still getting high CPU utilization after that then it’s either faulty hardware or there’s a phantom/rogue file left over in the system. Remember, Vera is nothing more than a very stripped down’ 'Nix computer. From what I’ve heard, the older Vera reset and upgrade routines unfortunately left rogue files in the system. Supposedly MCV is working on better procedures.

[quote=“RexBeckett, post:30, topic:186459”]

Given the fact that CPU utilization is in single digits for the first two minutes or so would lead me to two conclusions: 1) The remaining Plug Ins are not the core problem and 2) something kicks in at the two minute point

Have you checked LuaUPnP.log to see if anything shows-up at the point CPU usage climbs? My Edge seems to spend most of its CPU cycles logging the fact that it doesn’t like its own file system. ::)[/quote]

Well, i hadn’t thought about the Self-Hating File System but … :o

Invoking a scene or toggling a device has almost no overhead. In fact, it seems much snappier than invoking the same scene from a zwave controller. Since the command is sent via the internet rather than zwave, one can see why that occurs.

The amount of increased overhead when querying the status of devices would depend on how often you make the queries from Tasker. I do not yet have a constant on Android device showing the status of devices, so my querying is limited, and has little overhead. I can imagine if you were constantly asking for status updates every minute, then yes, there would be lot of overhead. But would it be any worse than what AutoVera has to do?

No, I haven’t yet, it fills me with dread :slight_smile: I’d almost rather suffer poor performance than risk lose everything and going back to Day One.

Before I do this i need to find the download link for 1.5.672 needed for the Fibaro Motion Sensors. Then a few stiff drinks, some prayer, a little casting of the bones …

With that load you must be getting a constant stream of inputs … you should look at the log and question why things are happening as often as they are.

No, I haven’t yet, it fills me with dread :slight_smile: I’d almost rather suffer poor performance than risk lose everything and going back to Day One.

Before I do this i need to find the download link for 1.5.672 needed for the Fibaro Motion Sensors. Then a few stiff drinks, some prayer, a little casting of the bones …[/quote]

If the logs don’t reveal the source of your problem(s), as Richard suggested, I would pull the rip cord and do it (factory reset) on your terms, and not when the Vera craps out and corrupts, because with those numbers I can’t believe it hasn’t happened yet… Recovering from that is NO fun let me tell ya… ???

Well THAT’S a different take on it. That might actually lead somewhere. Of course I’ve never actually had any real success in learning anything from the Vera logs …

In looking at the LuaUPnP logs I see constant streams of this when the utilization jumps from 8-9% to 30% and above


'Garage_LightsLocalON', state = false, seq = 1427818672.056, oseq = 1427820023.4802},{name = 'Garage_HouseDoorTriggered', state = false, seq = 1427818821.1058, oseq = 1427818828.8703},{name = 'Garage_LaundryMotion', state = false, seq = 1427818810.5578, oseq = 1427818820.7484},{name = 'Garage_LaundryLightState', state = '0', seq = 1427820022.4502, oseq = 1427818673.2904},{name = 'Garage_GarageLightState', state = '0', seq = 1427820023.8194, oseq = 1427818672.2379},{name = 'Alarm_ArmedAwaySet', state = false, seq = 1427637411.7611, oseq = 1427637462.1706},{name = 'Alarm_ArmedInstantSet', state = false, seq = 1427779116.8637, oseq = 1427803843.8657},{name = 'Alarm_ViolatedSet', state = false, seq = 1424351569.7392, oseq = 1425649615.3686},{name = 'Alarm_ArmedAway', state = '0', seq = 1427637462.1266, oseq = 1427637411.6},{name = 'Alarm_ArmedInstant', state = '0', seq = 1427803843.8244, oseq = 1427779116.82},{name = 'Alarm_Violated', state = '0', seq = 1424351569.6748, oseq = 0},{name = 'KWH_Reading', state = nil, seq = 1427749862.1063, oseq = 1427696569.7967},{name = 'GuestBath_DoorTimer', state = false, seq = 1427680635.989, oseq = 1427680756.1004},{name = 'GuestBath_SwitchTimer', state = false, seq = 0, oseq = 0},{name = 'GuestRoom_Timer', state = false, seq = 1427604577.7696, oseq = 1427605477.1006},{name = 'Hallway_ClosetDoorOpen', state = false, seq = 1427771643.732, oseq = 1427771650.4225},{name = 'Hallway_ClosetDoorClosed', state = true, seq = 1427771650.4234, oseq = 1427771643.9263},{name = 'Hallway_LightsON', state = false, seq = 1427771645.3749, oseq = 1427771652.023},{name = 'GuestBathroom_DoorClosed', state = false, seq = 1427678999.3933, oseq = 1427680635.5022},{name = 'GuestBathroom_DoorOpened', state = true, seq = 1427680635.5028, oseq = 1427678999.505},{name = 'GuestBathroom_LightsOFF', state = false, seq = 1427821993.4459, os <0x2d489680>
06	03/31/15 13:58:57.253	Device_Variable::m_szValue_set device: 414 service: urn:rts-services-com:serviceId:ProgramLogicC variable: ActionsMap was: {} now: {} #hooks: 0 upnp: 0 v:(nil)/NONE duplicate:1 __LEAK__ this:-933888 start:4136960 to 0x1536000 <0x2b755000>
04	03/31/15 13:58:57.287	 <0x2b755000>

Then it suddenly switches to line after line, thousands of them, of this:

01	03/31/15 13:59:00.797	FileUtils::ReadURL 28/resp:0 size 0 http://192.168.15.168/get_status.cgi <0x2c089680>
01	03/31/15 13:59:00.808	FileUtils::ReadURL 7/resp:0 size 0 https://127.0.0.1/alert?PK_AccessPoint=30010119&HW_Key=l764eZaXaGF3SRF8RkXeUdClxodkgy2o&DeviceID=414&LocalDate=2015-03-31%2007:05:31&LocalTimestamp=1427799931&AlertType=3&SourceType=3&Argument=56&Format=&Code=ConditionSatisfied&Value=Vera2HeartbeatRCVD&Description=Vera%202%20Heartbeat%20Received&Users= <0x2c089680>
01	03/31/15 13:59:00.809	RAServerSync::SendAlert retries 3 failed https://127.0.0.1/alert?PK_AccessPoint=30010119&HW_Key=l764eZaXaGF3SRF8RkXeUdClxodkgy2o&DeviceID=414&LocalDate=2015-03-31%2007:05:31&LocalTimestamp=1427799931&AlertType=3&SourceType=3&Argument=56&Format=&Code=ConditionSatisfied&Value=Vera2HeartbeatRCVD&Description=Vera%202%20Heartbeat%20Received&Users= age: 24809 file: /etc/cmh/persist/a_1427750113_1427799931_1824d28 <0x2c089680>
01	03/31/15 13:59:00.825	FileUtils::ReadURL 7/resp:0 size 0 https://127.0.0.1/alert?PK_AccessPoint=30010119&HW_Key=l764eZaXaGF3SRF8RkXeUdClxodkgy2o&DeviceID=414&LocalDate=2015-03-26%2019:44:29&LocalTimestamp=1427413469&AlertType=3&SourceType=3&Argument=60&Format=&Code=ConditionSatisfied&Value=Vera2HeartbeatRCVD&Description=Vera%202%20Heartbeat%20Received&Users= <0x2c089680>
01	03/31/15 13:59:00.826	RAServerSync::SendAlert retries 3 failed https://127.0.0.1/alert?PK_AccessPoint=30010119&HW_Key=l764eZaXaGF3SRF8RkXeUdClxodkgy2o&DeviceID=414&LocalDate=2015-03-26%2019:44:29&LocalTimestamp=1427413469&AlertType=3&SourceType=3&Argument=60&Format=&Code=ConditionSatisfied&Value=Vera2HeartbeatRCVD&Description=Vera%202%20Heartbeat%20Received&Users= age: 411271 file: /etc/cmh/persist/a_1427407920_1427413469_1eac728 <0x2c089680>
01	03/31/15 13:59:00.901	FileUtils::ReadURL 7/resp:0 size 0 https://127.0.0.1/alert?PK_AccessPoint=30010119&HW_Key=l764eZaXaGF3SRF8RkXeUdClxodkgy2o&DeviceID=414&LocalDate=2015-03-28%2019:32:51&LocalTimestamp=1427585571&AlertType=3&SourceType=3&Argument=57&Format=&Code=ConditionSatisfied&Value=Vera2HeartbeatRCVD&Description=Vera%202%20Heartbeat%20Received&Users= <0x2c089680>
01	03/31/15 13:59:00.901	RAServerSync::SendAlert retries 3 failed https://127.0.0.1/alert?PK_AccessPoint=30010119&HW_Key=l764eZaXaGF3SRF8RkXeUdClxodkgy2o&DeviceID=414&LocalDate=2015-03-28%2019:32:51&LocalTimestamp=1427585571&AlertType=3&SourceType=3&Argument=57&Format=&Code=ConditionSatisfied&Value=Vera2HeartbeatRCVD&Description=Vera%202%20Heartbeat%20Received&Users= age: 239169 file: /etc/cmh/persist/a_1427583752_1427585571_fda330 <0x2c089680>
01	03/31/15 13:59:00.938	FileUtils::ReadURL 7/resp:0 size 0 https://127.0.0.1/alert?PK_AccessPoint=30010119&HW_Key=l764eZaXaGF3SRF8RkXeUdClxodkgy2o&DeviceID=414&LocalDate=2015-03-25%2019:26:00&LocalTimestamp=1427325960&AlertType=3&SourceType=3&Argument=60&Format=&Code=ConditionSatisfied&Value=Vera2HeartbeatRCVD&Description=Vera%202%20Heartbeat%20Received&Users= <0x2c089680>
01	03/31/15 13:59:00.938	RAServerSync::SendAlert retries 3 failed https://127.0.0.1/alert?PK_AccessPoint=30010119&HW_Key=l764eZaXaGF3SRF8RkXeUdClxodkgy2o&DeviceID=414&LocalDate=2015-03-25%2019:26:00&LocalTimestamp=1427325960&AlertType=3&SourceType=3&Argument=60&Format=&Code=ConditionSatisfied&Value=Vera2HeartbeatRCVD&Description=Vera%202%20Heartbeat%20Received&Users= age: 498780 file: /etc/cmh/persist/a_1427325418_1427325960_18fb8f8 <0x2c089680>
01	03/31/15 13:59:00.965	FileUtils::ReadURL 7/resp:0 size 0 https://127.0.0.1/alert?PK_AccessPoint=30010119&HW_Key=l764eZaXaGF3SRF8RkXeUdClxodkgy2o&DeviceID=412&LocalDate=2015-03-24%2019:14:04&LocalTimestamp=1427238844&AlertType=3&SourceType=3&Argument=36&Format=&Code=Status&Value=0&Description=Time%20of%20Day%20Sunset&Users=627671 <0x2c089680>
01	03/31/15 13:59:00.966	RAServerSync::SendAlert retries 3 failed https://127.0.0.1/alert?PK_AccessPoint=30010119&HW_Key=l764eZaXaGF3SRF8RkXeUdClxodkgy2o&DeviceID=412&LocalDate=2015-03-24%2019:14:04&LocalTimestamp=1427238844&AlertType=3&SourceType=3&Argument=36&Format=&Code=Status&Value=0&Description=Time%20of%20Day%20Sunset&Users=627671 age: 585896 file: /etc/cmh/persist/a_1427234942_1427238844_1578d58 <0x2c089680>

Something is definitely wrong …

Well, I bit the bullet and did a factory reset and can’t access the UI now. SSH/TOP shows LuaUPnP has not started and i’m unable to access the UI so now I have to wait for Tech Support

Sorry.