Vera Plus Zigbee crash since 7.0.19

Not sure if any one else is observing this problem but I am currently seeing extremely frequent Zigbee devices going “device not detected” and being unresponsive to commands.

I started observing this since 7.0.19 I believe but they were rare. I actually had been running for several months on 7.0.20 without this problem except a few times right after the firmware upgrade. I saw that they were very frequent on 7.0.21 and 7.0.22 and have downgraded every time for this reason. I have now upgraded to 7.0.23 which addresses an internet connection loss luup reload problem and am getting notifications that my Zigbee network has dropped about once every 2 hours after every luup reload.

I have contacted CC and even called but still have not gotten any good response.
A simple luup reload fixed the problem for an hour or two and then zigbee crashes again. I previously had to reboot the vera on 7.0.19 to recover. Luup reload only used not to work. Now it does.

Further discovery:

After a Luup reload, Zigbee seems to work but ends up crashing between seconds to an hour of the reload.
When it crashes the devices don’t respond to command and the UI throws out a “device not detected” before I get a notification that the device is offline between 1 and 2 hrs later.
It has become very consistent with 7.0.23 and only very occasional on 7.0.21 and almost never occured on 7.0.20 and 7.0.19.

It looks to me as if the Zigbee module/driver/task within the OS is crashing and a luup reload restarts it.
Waiting for CC’s response which so far has been unhelpful.

Well no response from CC but something strange happened and appears to have fixed the problem:

  1. Updated AltUI to the latest version. The previous version seemed to be working fine and I was using it to gauge the zigbee device connections. I noticed the zigbee feedback were slow when I had the zigbee problem even when the network worked.
    2 Disabled auto update on the idiotic and useless Sercomm IP cam plugin.

Obviously the two actions above triggered Luup reloads.

Low and behold, now the wattage of the Centralite switch updates instantly after the switch turns on and the zigbee network no longer dies after a few minutes. :-\ :cry: ??? :o

Edit: spoke too fast. It still crashed 45min later …

Out of desperation I have taken the following steps and resolved the problem:

  1. Downgraded to 7.0.20 without resetting my settings and without any recovery from backup. (Had to downgrade the WWNest plugin though as the last version requires the newer firmwares)

→ no change, still observing Zigbee drop after a few minutes up to an hour.

  1. Removed the Sercomm Plugin which was automatically installed during the firmware upgrade

→ problem resolved. I am no longer seeing any Zigbee drop after 2 hours.

I am not sure whether it is a resource problem or a bug within the Sercomm plugin which should really be removed from the firmware anyway. I remember having some user data saving errors previously which was resolved by CC by removing a lot of useless files on the vera.

Quick update on this.

After more troubleshooting and contribution for CC:

7.0.23 failure frequency is once in <2hr
7.0.20 varies between once every 2hr and 12hr
I had never observed this problem before 7.0.19

When Zigbee drops, none of my device seem to indicate that they lost network connection. Only the vera reports it can’t contact any of the Zigbee devices.
A Luup reload recovers the problem 95% of the time. It could be that when it fails to recover that the event that causes the failure occurs during the luup reload. Not sure.
I even have an RF scanner to look at the entire 2.4GHz bandwidth and the channel I am using for the Vera is very clean.
This is leading me to think that the failure is not related to the Zigbee network itself as it seems to be communicating but only to the vera.
I am suspecting that the underpowered CPU on the Vera Plus runs intermittently causes the Luup engine to run into timing conflicts or runs out of sync with the Zigbee driver. Because of my 130+ zwave devices and 16 plugins, 30+ virtual devices and 100+ scenes, the newer firmwares with their additional features might be throwing the CPU over the edge and increase the frequency of failures of the Luup engine.
The out of sync phenomenon is actually quite frequent as I see occasional luup reload errors where “error in scene or startup lua” appear after a reload and manually reloading gets rid of error.
No indication of either DRAM or Flash storage running out.

I am told by CC that they are looking into it but are not able to reproduce the problem. I suppose they do not have the a real life system with 100+ devices. It might be time for me to move to a multi-controller system.