Zigbee Devices Offline... ALL OF THEM

philpompili · December 1, 2019, 3:04pm

Hello, so I’ve been having some problems with Reactor, couldn’t figure out why. Now I am thinking that it has something to do with my Zigbee network.

My entire Zigbee network is down. I’ve tried restoring to a backup from about a week ago (I definitely wasn’t having any problems then), but the same problem persists. I have 6 Stelpro SMT402 Zigbee thermostats and 2 Iris Smart Plugs. None of those devices are responding to commands.

I’m getting this in the log for all of my Zigbee Devices. (Device 126 is a Stelpro Thermostat, but getting the same sort fo logs for all of them)

JobHandler::Run job#34 :zb_polling dev:126 (0x187f4b0) P:100 S:0 Id: 34 is 49.577123000 seconds old <0x7583c520>
02 12/01/19 10:02:01.683 ZBJob_PollNode::Run job#34 :zb_polling dev:126 (0x187f4b0) P:100 S:0 Id: 34 Sending Command <0x7583c520>
02 12/01/19 10:02:06.390 ZigbeeCommand::HandleResponse ACK Delivery Failed with status 0x66 <0x75a3c520>
01 12/01/19 10:02:06.391 ZBJob_PollNode::receiveFrame job#34 :zb_polling dev:126 (0x187f4b0) P:100 S:5 Id: 34 Response Error <0x75a3c520>
02 12/01/19 10:02:06.402 ZBJob_PollNode::Run job#34 :zb_polling dev:126 (0x187f4b0) P:100 S:5 Id: 34 Job finished with errors <0x7583c520>
06 12/01/19 10:02:06.403 Device_Variable::m_szValue_set device: 126 service: urn:micasaverde-com:serviceId:ZigbeeNetwork1 variable: ConsecutivePollFails was: 14 now: 15 #hooks: 0 upnp: 0 skip: 0 v:(nil)/NONE duplicate:0 <0x7583c520>
02 12/01/19 10:02:06.403 Device_Basic::AddPoll 126 poll list full, deleting old one <0x7583c520>
06 12/01/19 10:02:06.404 Device_Variable::m_szValue_set device: 126 service: urn:micasaverde-com:serviceId:HaDevice1 variable: PollRatings was: 4.70 now: 4.60 #hooks: 0 upnp: 0 skip: 0 v:(nil)/NONE duplicate:0 <0x7583c520>
04 12/01/19 10:02:06.405 <0x7583c520>
02 12/01/19 10:02:06.405 JobHandler::PurgeCompletedJobs purge job#34 :zb_polling dev:126 (0x187f4b0) P:100 S:2 Id: 34 zb_polling status 2 <0x7583c520>

02	12/01/19 10:05:12.101	ZigbeeJobHandler::ServicePollLoop Adding PollJob for Device 96 <0x7663c520>
02	12/01/19 10:05:12.102	ZigbeeJobHandler::ServicePollLoop Adding PollJob for Device 99 <0x7663c520>
02	12/01/19 10:05:12.123	ZigbeeJobHandler::ServicePollLoop Adding PollJob for Device 101 <0x7663c520>
02	12/01/19 10:05:12.123	ZigbeeJobHandler::ServicePollLoop Adding PollJob for Device 104 <0x7663c520>
02	12/01/19 10:05:12.124	ZigbeeJobHandler::ServicePollLoop Adding PollJob for Device 110 <0x7663c520>
02	12/01/19 10:05:12.125	ZigbeeJobHandler::ServicePollLoop Adding PollJob for Device 121 <0x7663c520>
02	12/01/19 10:05:12.125	ZigbeeJobHandler::ServicePollLoop Adding PollJob for Device 126 <0x7663c520>
02	12/01/19 10:05:12.126	ZigbeeJobHandler::ServicePollLoop Adding PollJob for Device 133 <0x7663c520>
02	12/01/19 10:05:12.132	ZBJob_PollNode::Run job#61 :zb_polling dev:96 (0x175b318) P:100 S:0 Id: 61 Sending Command <0x7583c520>
02	12/01/19 10:05:16.844	ZigbeeCommand::HandleResponse ACK Delivery Failed with status 0x66 <0x75a3c520>
01	12/01/19 10:05:16.844	ZBJob_PollNode::receiveFrame job#61 :zb_polling dev:96 (0x175b318) P:100 S:5 Id: 61 Response Error <0x75a3c520>
02	12/01/19 10:05:16.856	ZBJob_PollNode::Run job#61 :zb_polling dev:96 (0x175b318) P:100 S:5 Id: 61 Job finished with errors <0x7583c520>
06	12/01/19 10:05:16.856	Device_Variable::m_szValue_set device: 96 service: urn:micasaverde-com:serviceId:ZigbeeNetwork1 variable: ConsecutivePollFails was: 23 now: 24 #hooks: 0 upnp: 0 skip: 0 v:(nil)/NONE duplicate:0 <0x7583c520>
02	12/01/19 10:05:16.857	Device_Basic::AddPoll 96 poll list full, deleting old one <0x7583c520>
06	12/01/19 10:05:16.858	Device_Variable::m_szValue_set device: 96 service: urn:micasaverde-com:serviceId:HaDevice1 variable: PollRatings was: 4.20 now: 4.10 #hooks: 0 upnp: 0 skip: 0 v:(nil)/NONE duplicate:0 <0x7583c520>
04	12/01/19 10:05:16.858	<0x7583c520>
02	12/01/19 10:05:16.859	JobHandler::PurgeCompletedJobs purge job#61 :zb_polling dev:96 (0x175b318) P:100 S:2 Id: 61 zb_polling status 2 <0x7583c520>
02	12/01/19 10:05:16.859	JobHandler::Run job#62 :zb_polling dev:99 (0x175b508) P:100 S:0 Id: 62 is 4.757592000 seconds old <0x7583c520>
02	12/01/19 10:05:16.859	ZBJob_PollNode::Run job#62 :zb_polling dev:99 (0x175b508) P:100 S:0 Id: 62 Sending Command <0x7583c520>
02	12/01/19 10:05:21.570	ZigbeeCommand::HandleResponse ACK Delivery Failed with status 0x66 <0x75a3c520>
01	12/01/19 10:05:21.570	ZBJob_PollNode::receiveFrame job#62 :zb_polling dev:99 (0x175b508) P:100 S:5 Id: 62 Response Error <0x75a3c520>
02	12/01/19 10:05:21.582	ZBJob_PollNode::Run job#62 :zb_polling dev:99 (0x175b508) P:100 S:5 Id: 62 Job finished with errors <0x7583c520>
06	12/01/19 10:05:21.583	Device_Variable::m_szValue_set device: 99 service: urn:micasaverde-com:serviceId:ZigbeeNetwork1 variable: ConsecutivePollFails was: 26 now: 27 #hooks: 0 upnp: 0 skip: 0 v:(nil)/NONE duplicate:0 <0x7583c520>
02	12/01/19 10:05:21.583	Device_Basic::AddPoll 99 poll list full, deleting old one <0x7583c520>
06	12/01/19 10:05:21.584	Device_Variable::m_szValue_set device: 99 service: urn:micasaverde-com:serviceId:HaDevice1 variable: PollRatings was: 4.20 now: 4.10 #hooks: 0 upnp: 0 skip: 0 v:(nil)/NONE duplicate:0 <0x7583c520>
04	12/01/19 10:05:21.585	<0x7583c520>
02	12/01/19 10:05:21.585	JobHandler::PurgeCompletedJobs purge job#62 :zb_polling dev:99 (0x175b508) P:100 S:2 Id: 62 zb_polling status 2 <0x7583c520>
02	12/01/19 10:05:21.585	JobHandler::Run job#63 :zb_polling dev:101 (0x187f458) P:100 S:0 Id: 63 is 9.463236000 seconds old <0x7583c520>
02	12/01/19 10:05:21.586	ZBJob_PollNode::Run job#63 :zb_polling dev:101 (0x187f458) P:100 S:0 Id: 63 Sending Command <0x7583c520>
02	12/01/19 10:05:26.297	ZigbeeCommand::HandleResponse ACK Delivery Failed with status 0x66

philpompili · December 1, 2019, 10:09pm

Has anyone had any similar experience with their entire Zigbee network failing?

HSD99 · December 2, 2019, 12:20am

Have you power cycled the Vera? If the Zigbee co-processor is stuck, this might unstick it.

philpompili · December 2, 2019, 1:51am

Hi, I have indeed tried to power cycle. I’ve performed both hard and soft reboots of the Vera with no luck. I’ve also restored to a backup from about a week ago (I know for sure that things were working then) and yet still no Zigbee.

If it was just one or two devices I’d think it was the device itself, but the ENTIRE Zigbee network. It gas to be the Vera, no?

rafale77 · December 2, 2019, 2:04am

The only thing I can think of in this case is a failed zigbee chip. It’s a hardware failure. To validate, after your reboot, could you reset and try to add a new zigbee device?

HSD99 · December 2, 2019, 2:27am

That, or perhaps a failed Zigbee device that’s jabbering on the network. @rafale77’s idea is a good one and should give us more data. The other possibility is that some software component of the the Zigbee network has become corrupted.

philpompili · December 2, 2019, 3:03am

Ok, so I did another hard reboot. Waiting for about 30 min to power back up. Just wanted to make sure that the device was starting cold. When I got back into the UI, I still had all my Zigbee devices displaying ‘Can’t Detect Device’.

I then removed one of my Iris Smart Plugs and re added it. The device was successfully re-added and is working.

My other Zigbee devices are still listed as ‘Can’t detect device’. Is my only option just to remove and re-add all of my devices?

I’d like to find out what caused this problem in the first place if I am able.

Not sure where to go from here. If I need to re-add everything so be it. Any idea’s on how to trouble shoot this?

Cheers,

rafale77 · December 2, 2019, 3:07am

One other trick. Power cycle each of them without deleting and re-adding them. It now appears that you have one device messing with your network.

HSD99 · December 2, 2019, 3:11am

Yup. Next step.

philpompili · December 2, 2019, 3:13am

Ok, thanks. I’ll try power cycling the thermostats tomorrow (they are hard wired and requires a trip down to the garage {I didn’t put the panel there] to power cycle)This shouldn’t bee too difficult as I only have 6 thermostats and the other iris plus is on this floor.

I’ll reply to this thread to let you know how it all went. I suspect though, that until I power cycle the ‘correct’ device, that this will keep happening. I had power cycled the iris smart plug which was just recently re-added a few times with no luck. I think you are right, that one of the thermostats is sending some wonky business through the network.

Just so I am clear though, if I power cycle everything and still no luck, we go tot the nuclear option of deleting and re-adding devices, correct?

Thanks for your help.

rafale77 · December 2, 2019, 3:16am

If you have access to your logs, you maybe able to figure which one. I had one device go crazy once and was causing infinite luup reload loops on the vera… I dug the problem child’s ID from the log.

philpompili · December 2, 2019, 3:18am

Ok, just tried to power cycle the other Iris smart plug and no dice. Looks like I have to find out which device is causing the mayhem and isolate it, otherwise I’ll have to delete and re-add individually.

Can you give me an idea of what I am looking for in the logs? Are there specific keywords I can Ctrl-F?

HSD99 · December 2, 2019, 3:18am

You’ll want to do this one at a time to try and isolate which device is the culprit. Hopefully a power cycle will restore normal operation to the errant device. If you have a device that has really failed, you’ll know when bringing that device back online kills the network.

rafale77 · December 2, 2019, 3:23am

If you can find it, try to find when the problem started (I am seeing 26 poll failed now 27) You should be able to find the time point when it all started and figure out which device caused it.

philpompili · December 2, 2019, 3:27am

Haha, my log access doesn’t go back that far. I use my browser to view the logs (I’m not at all comfortable with Terminal).

Unfortunately I’m here now, ConsecutivePollFails was: 703 now: 704

Sorin · December 2, 2019, 7:18am

Hi Phil, if this is still an ongoing issue I advise checking with our support team as well. Click on the “need help” button in the header bar to submit a ticket.

philpompili · December 2, 2019, 10:04pm

I think I found the problem… or at least part if the problem…

The only RED I have in my logs before Reactor starts going haywire is the following.

UserData::WriteUserData saved–before move File Size: 46411 save size 46411 <0x76724520>

I’ve powered down all of my powered Zigbee devices so no more interference there.

HSD99 · December 2, 2019, 10:22pm

The UserData log entry is not an error—it’s Vera’s way of telling you that the periodic save of your user date json file has occurred. This is normal, in spite of the red attribute and obscure phrasing of the log entry.

philpompili · December 2, 2019, 11:01pm

What about this line?

2	12/02/19 17:01:29.404	UserData::TempLogFileSystemFailure start 0 <0x72eae520>
02	12/02/19 17:01:29.425	UserData::TempLogFileSystemFailure 5139 res:1

Or even this line?
JobHandler_LuaUPnP::RunAction device -1 action urn:micasaverde-com:serviceId:HomeAutomationGateway1/DeletePlugin failed with 401/Invalid plugin <0x72eae520>

Just looking for something nefarious. Because Reactor detects that the system was restarted many times right after that.

philpompili · December 2, 2019, 11:37pm

Just to be clear, what I am seeing here is a Luup restart, correct?

This is happening once every few minutes.