Zigbee Devices Offline... ALL OF THEM

Hello, so I’ve been having some problems with Reactor, couldn’t figure out why. Now I am thinking that it has something to do with my Zigbee network.

My entire Zigbee network is down. I’ve tried restoring to a backup from about a week ago (I definitely wasn’t having any problems then), but the same problem persists. I have 6 Stelpro SMT402 Zigbee thermostats and 2 Iris Smart Plugs. None of those devices are responding to commands.

I’m getting this in the log for all of my Zigbee Devices. (Device 126 is a Stelpro Thermostat, but getting the same sort fo logs for all of them)

JobHandler::Run job#34 :zb_polling dev:126 (0x187f4b0) P:100 S:0 Id: 34 is 49.577123000 seconds old <0x7583c520>
02 12/01/19 10:02:01.683 ZBJob_PollNode::Run job#34 :zb_polling dev:126 (0x187f4b0) P:100 S:0 Id: 34 Sending Command <0x7583c520>
02 12/01/19 10:02:06.390 ZigbeeCommand::HandleResponse ACK Delivery Failed with status 0x66 <0x75a3c520>
01 12/01/19 10:02:06.391 ZBJob_PollNode::receiveFrame job#34 :zb_polling dev:126 (0x187f4b0) P:100 S:5 Id: 34 Response Error <0x75a3c520>
02 12/01/19 10:02:06.402 ZBJob_PollNode::Run job#34 :zb_polling dev:126 (0x187f4b0) P:100 S:5 Id: 34 Job finished with errors <0x7583c520>
06 12/01/19 10:02:06.403 Device_Variable::m_szValue_set device: 126 service: urn:micasaverde-com:serviceId:ZigbeeNetwork1 variable: ConsecutivePollFails was: 14 now: 15 #hooks: 0 upnp: 0 skip: 0 v:(nil)/NONE duplicate:0 <0x7583c520>
02 12/01/19 10:02:06.403 Device_Basic::AddPoll 126 poll list full, deleting old one <0x7583c520>
06 12/01/19 10:02:06.404 Device_Variable::m_szValue_set device: 126 service: urn:micasaverde-com:serviceId:HaDevice1 variable: PollRatings was: 4.70 now: 4.60 #hooks: 0 upnp: 0 skip: 0 v:(nil)/NONE duplicate:0 <0x7583c520>
04 12/01/19 10:02:06.405 <0x7583c520>
02 12/01/19 10:02:06.405 JobHandler::PurgeCompletedJobs purge job#34 :zb_polling dev:126 (0x187f4b0) P:100 S:2 Id: 34 zb_polling status 2 <0x7583c520>

02 12/01/19 10:05:12.101 ZigbeeJobHandler::ServicePollLoop Adding PollJob for Device 96 <0x7663c520>
02 12/01/19 10:05:12.102 ZigbeeJobHandler::ServicePollLoop Adding PollJob for Device 99 <0x7663c520>
02 12/01/19 10:05:12.123 ZigbeeJobHandler::ServicePollLoop Adding PollJob for Device 101 <0x7663c520>
02 12/01/19 10:05:12.123 ZigbeeJobHandler::ServicePollLoop Adding PollJob for Device 104 <0x7663c520>
02 12/01/19 10:05:12.124 ZigbeeJobHandler::ServicePollLoop Adding PollJob for Device 110 <0x7663c520>
02 12/01/19 10:05:12.125 ZigbeeJobHandler::ServicePollLoop Adding PollJob for Device 121 <0x7663c520>
02 12/01/19 10:05:12.125 ZigbeeJobHandler::ServicePollLoop Adding PollJob for Device 126 <0x7663c520>
02 12/01/19 10:05:12.126 ZigbeeJobHandler::ServicePollLoop Adding PollJob for Device 133 <0x7663c520>
02 12/01/19 10:05:12.132 ZBJob_PollNode::Run job#61 :zb_polling dev:96 (0x175b318) P:100 S:0 Id: 61 Sending Command <0x7583c520>
02 12/01/19 10:05:16.844 ZigbeeCommand::HandleResponse ACK Delivery Failed with status 0x66 <0x75a3c520>
01 12/01/19 10:05:16.844 ZBJob_PollNode::receiveFrame job#61 :zb_polling dev:96 (0x175b318) P:100 S:5 Id: 61 Response Error <0x75a3c520>
02 12/01/19 10:05:16.856 ZBJob_PollNode::Run job#61 :zb_polling dev:96 (0x175b318) P:100 S:5 Id: 61 Job finished with errors <0x7583c520>
06 12/01/19 10:05:16.856 Device_Variable::m_szValue_set device: 96 service: urn:micasaverde-com:serviceId:ZigbeeNetwork1 variable: ConsecutivePollFails was: 23 now: 24 #hooks: 0 upnp: 0 skip: 0 v:(nil)/NONE duplicate:0 <0x7583c520>
02 12/01/19 10:05:16.857 Device_Basic::AddPoll 96 poll list full, deleting old one <0x7583c520>
06 12/01/19 10:05:16.858 Device_Variable::m_szValue_set device: 96 service: urn:micasaverde-com:serviceId:HaDevice1 variable: PollRatings was: 4.20 now: 4.10 #hooks: 0 upnp: 0 skip: 0 v:(nil)/NONE duplicate:0 <0x7583c520>
04 12/01/19 10:05:16.858 <0x7583c520>
02 12/01/19 10:05:16.859 JobHandler::PurgeCompletedJobs purge job#61 :zb_polling dev:96 (0x175b318) P:100 S:2 Id: 61 zb_polling status 2 <0x7583c520>
02 12/01/19 10:05:16.859 JobHandler::Run job#62 :zb_polling dev:99 (0x175b508) P:100 S:0 Id: 62 is 4.757592000 seconds old <0x7583c520>
02 12/01/19 10:05:16.859 ZBJob_PollNode::Run job#62 :zb_polling dev:99 (0x175b508) P:100 S:0 Id: 62 Sending Command <0x7583c520>
02 12/01/19 10:05:21.570 ZigbeeCommand::HandleResponse ACK Delivery Failed with status 0x66 <0x75a3c520>
01 12/01/19 10:05:21.570 ZBJob_PollNode::receiveFrame job#62 :zb_polling dev:99 (0x175b508) P:100 S:5 Id: 62 Response Error <0x75a3c520>
02 12/01/19 10:05:21.582 ZBJob_PollNode::Run job#62 :zb_polling dev:99 (0x175b508) P:100 S:5 Id: 62 Job finished with errors <0x7583c520>
06 12/01/19 10:05:21.583 Device_Variable::m_szValue_set device: 99 service: urn:micasaverde-com:serviceId:ZigbeeNetwork1 variable: ConsecutivePollFails was: 26 now: 27 #hooks: 0 upnp: 0 skip: 0 v:(nil)/NONE duplicate:0 <0x7583c520>
02 12/01/19 10:05:21.583 Device_Basic::AddPoll 99 poll list full, deleting old one <0x7583c520>
06 12/01/19 10:05:21.584 Device_Variable::m_szValue_set device: 99 service: urn:micasaverde-com:serviceId:HaDevice1 variable: PollRatings was: 4.20 now: 4.10 #hooks: 0 upnp: 0 skip: 0 v:(nil)/NONE duplicate:0 <0x7583c520>
04 12/01/19 10:05:21.585 <0x7583c520>
02 12/01/19 10:05:21.585 JobHandler::PurgeCompletedJobs purge job#62 :zb_polling dev:99 (0x175b508) P:100 S:2 Id: 62 zb_polling status 2 <0x7583c520>
02 12/01/19 10:05:21.585 JobHandler::Run job#63 :zb_polling dev:101 (0x187f458) P:100 S:0 Id: 63 is 9.463236000 seconds old <0x7583c520>
02 12/01/19 10:05:21.586 ZBJob_PollNode::Run job#63 :zb_polling dev:101 (0x187f458) P:100 S:0 Id: 63 Sending Command <0x7583c520>
02 12/01/19 10:05:26.297 ZigbeeCommand::HandleResponse ACK Delivery Failed with status 0x66

Has anyone had any similar experience with their entire Zigbee network failing?

Have you power cycled the Vera? If the Zigbee co-processor is stuck, this might unstick it.

Hi, I have indeed tried to power cycle. I’ve performed both hard and soft reboots of the Vera with no luck. I’ve also restored to a backup from about a week ago (I know for sure that things were working then) and yet still no Zigbee.

If it was just one or two devices I’d think it was the device itself, but the ENTIRE Zigbee network. It gas to be the Vera, no?

The only thing I can think of in this case is a failed zigbee chip. It’s a hardware failure. To validate, after your reboot, could you reset and try to add a new zigbee device?

1 Like

That, or perhaps a failed Zigbee device that’s jabbering on the network. @rafale77’s idea is a good one and should give us more data. The other possibility is that some software component of the the Zigbee network has become corrupted.

1 Like

Ok, so I did another hard reboot. Waiting for about 30 min to power back up. Just wanted to make sure that the device was starting cold. When I got back into the UI, I still had all my Zigbee devices displaying ‘Can’t Detect Device’.

I then removed one of my Iris Smart Plugs and re added it. The device was successfully re-added and is working.

My other Zigbee devices are still listed as ‘Can’t detect device’. Is my only option just to remove and re-add all of my devices?

I’d like to find out what caused this problem in the first place if I am able.

Not sure where to go from here. If I need to re-add everything so be it. Any idea’s on how to trouble shoot this?

Cheers,

One other trick. Power cycle each of them without deleting and re-adding them. It now appears that you have one device messing with your network.

2 Likes

Yup. Next step.

1 Like

Ok, thanks. I’ll try power cycling the thermostats tomorrow (they are hard wired and requires a trip down to the garage {I didn’t put the panel there] to power cycle)This shouldn’t bee too difficult as I only have 6 thermostats and the other iris plus is on this floor.

I’ll reply to this thread to let you know how it all went. I suspect though, that until I power cycle the ‘correct’ device, that this will keep happening. I had power cycled the iris smart plug which was just recently re-added a few times with no luck. I think you are right, that one of the thermostats is sending some wonky business through the network.

Just so I am clear though, if I power cycle everything and still no luck, we go tot the nuclear option of deleting and re-adding devices, correct?

Thanks for your help.

If you have access to your logs, you maybe able to figure which one. I had one device go crazy once and was causing infinite luup reload loops on the vera… I dug the problem child’s ID from the log.

1 Like

Ok, just tried to power cycle the other Iris smart plug and no dice. Looks like I have to find out which device is causing the mayhem and isolate it, otherwise I’ll have to delete and re-add individually.

Can you give me an idea of what I am looking for in the logs? Are there specific keywords I can Ctrl-F?

You’ll want to do this one at a time to try and isolate which device is the culprit. Hopefully a power cycle will restore normal operation to the errant device. If you have a device that has really failed, you’ll know when bringing that device back online kills the network.

2 Likes

If you can find it, try to find when the problem started (I am seeing 26 poll failed now 27) You should be able to find the time point when it all started and figure out which device caused it.

Haha, my log access doesn’t go back that far. I use my browser to view the logs (I’m not at all comfortable with Terminal).

Unfortunately I’m here now, ConsecutivePollFails was: 703 now: 704

Hi Phil, if this is still an ongoing issue I advise checking with our support team as well. Click on the “need help” button in the header bar to submit a ticket.

1 Like

I think I found the problem… or at least part if the problem…

The only RED I have in my logs before Reactor starts going haywire is the following.

UserData::WriteUserData saved–before move File Size: 46411 save size 46411 <0x76724520>

I’ve powered down all of my powered Zigbee devices so no more interference there.

The UserData log entry is not an error—it’s Vera’s way of telling you that the periodic save of your user date json file has occurred. This is normal, in spite of the red attribute and obscure phrasing of the log entry.

What about this line?

2 12/02/19 17:01:29.404 UserData::TempLogFileSystemFailure start 0 <0x72eae520>
02 12/02/19 17:01:29.425 UserData::TempLogFileSystemFailure 5139 res:1

Or even this line?
JobHandler_LuaUPnP::RunAction device -1 action urn:micasaverde-com:serviceId:HomeAutomationGateway1/DeletePlugin failed with 401/Invalid plugin <0x72eae520>

Just looking for something nefarious. Because Reactor detects that the system was restarted many times right after that.

Just to be clear, what I am seeing here is a Luup restart, correct?

This is happening once every few minutes.