Need advice : log reading ; clean setup

It’s now 8 months I’m have my Vera Lite and my Zwave network is continuously growing.
During these 8 months I did a lot of tests, from failed basic plug-in creation to devices advanced settings change and of course a few very basic lua coding for scene.

Even if all is mostly running well, I regularly have some strange issue and my logs are full of red and yellow lines.

For example, I can’t get a motion sensor to became reliable when far from Vera. It looks like the routing is not updating.
The sensor is working well close to Vera but not when at the other side of the house. Even after waiting for Network healing or forced update of neighbours and even if the Z-wave dimmer just near don’t have communication issue.
I’ve tried to switch front and read house sensors but problem remain at that place, it’s not following the sensor.

Today I have an other issue. I try to add lua code to avoid the door lamp to turn on when motion during the day.
Since that time, even if I remove all code, I have the “ERROR : Error in lua for scenes and events” on top of my web portal.

So I more and more wonder if I should not restart from scratch with a clean setup.
But It will be a lot of work. do you think it can be a good idea ?
Or is there other way to start from a clean base ? I have a second new Vera Lite. Maybe it can help.

Any tips regarding the issues above or the from scratch scenario are welcome.

Thank you and good winter break !

PS : log sample :

[code]02 12/23/12 11:27:37.868 ZW_Send_Data node 9 NO ROUTE (nil) <0x2b733680>

02 12/23/12 11:28:05.103 ZW_Send_Data node 17 USING ROUTE 38.34.0.0 <0x2b733680>
02 12/23/12 11:28:07.367 ZW_Send_Data node 17 USING ROUTE 38.34.0.0 <0x2b733680>

02 12/23/12 11:28:20.886 ZWJob_SendData UPDATE MANUAL ROUTE 45=(nil) <0x2b333680>
02 12/23/12 11:28:20.889 UPDATE MANUAL ROUTE2 45=(nil) <0x2b733680>
02 12/23/12 11:28:20.889 ZW_Send_Data node 45 NO ROUTE (nil) <0x2b733680>
04 12/23/12 11:28:20.997 <0x2b333680>
06 12/23/12 11:28:21.036 Device_Variable::m_szValue_set device: 164 service: urn:micasaverde-com:serviceId:ZWaveDevice1 variable: LastWakeup was: 1356258209 now: 1356258501 #hooks: 0 upnp: 0 v:(nil)/NONE duplicate:0 <0x2b333680>
02 12/23/12 11:28:21.037 ZWaveNode::Wakeup did a poll for 45 881 seconds interval 10800 existing (nil) heal (nil) <0x2b333680>
02 12/23/12 11:28:21.038 ZWJob_SendData UPDATE MANUAL ROUTE 45=(nil) <0x2b333680>
02 12/23/12 11:28:21.039 UPDATE MANUAL ROUTE2 45=(nil) <0x2b733680>
02 12/23/12 11:28:21.040 ZW_Send_Data node 45 NO ROUTE (nil) <0x2b733680>

02 12/23/12 11:28:35.103 ZW_Send_Data node 18 USING ROUTE 38.35.0.0 <0x2b733680>

02 12/23/12 11:29:05.103 ZW_Send_Data node 26 USING ROUTE 36.0.0.0 <0x2b733680>

02 12/23/12 11:29:35.103 ZW_Send_Data node 28 USING ROUTE 38.37.36.0 <0x2b733680>

02 12/23/12 11:29:37.427 ZWaveSerial::GetFrame 0x2b732f60 timed out now 0 m_listGetFramePending 0 <0x2b733680>
02 12/23/12 11:29:37.427 ZWaveJobHandler::SendDataAbort got m_iFrameID 0 <0x2b733680>
01 12/23/12 11:29:37.428 ZWJob_PollNode::Run job job#67 :pollnode #28 (0x106d6a8) N:28 P:100 S:1 ZW_Send_Data to node 28 failed 1 req 0x105f190/6 abort m_iFrameID 0 <0x2b733680>
02 12/23/12 11:29:37.428 ZWJob_PollNode::PollFailed job job#67 :pollnode #28 (0x106d6a8) N:28 P:100 S:1 node 28 battery 0 <0x2b733680>
06 12/23/12 11:29:37.429 Device_Variable::m_szValue_set device: 1 service: urn:micasaverde-com:serviceId:ZWaveNetwork1 variable: LastError was: Node 20 failed now: Poll failed #hooks: 0 upnp: 0 v:0xc93c18/NONE duplicate:0 <0x2b733680>
04 12/23/12 11:29:37.430 <0x2b733680>
02 12/23/12 11:29:37.431 JobHandler::PurgeCompletedJobs purge job#67 :pollnode #28 (0x106d6a8) N:28 P:100 S:2 pollnode #28 1 cmds status 2 <0x2b733680>

02 12/23/12 11:33:14.437 ZWJob_SendData UPDATE MANUAL ROUTE 45=(nil) <0x2b333680>
02 12/23/12 11:33:14.438 UPDATE MANUAL ROUTE2 45=(nil) <0x2b733680>
02 12/23/12 11:33:14.439 ZW_Send_Data node 45 NO ROUTE (nil) <0x2b733680>
04 12/23/12 11:33:14.527 <0x2b333680>
06 12/23/12 11:33:14.578 Device_Variable::m_szValue_set device: 164 service: urn:micasaverde-com:serviceId:ZWaveDevice1 variable: LastWakeup was: 1356258501 now: 1356258794 #hooks: 0 upnp: 0 v:(nil)/NONE duplicate:0 <0x2b333680>
02 12/23/12 11:33:14.578 ZWaveNode::Wakeup did a poll for 45 1174 seconds interval 10800 existing (nil) heal (nil) <0x2b333680>
02 12/23/12 11:33:14.579 ZWJob_SendData UPDATE MANUAL ROUTE 45=(nil) <0x2b333680>
02 12/23/12 11:33:14.582 UPDATE MANUAL ROUTE2 45=(nil) <0x2b733680>
02 12/23/12 11:33:14.582 ZW_Send_Data node 45 NO ROUTE (nil) <0x2b733680>
04 12/23/12 11:33:14.687 <0x2b333680>

01 12/23/12 11:41:43.043 ZWJob_PollNode::ReceivedFrame job job#99 :pollnode #9 (0x10a7b18) N:9 P:100 S:5 got FUNC_ID_APPLICATION_COMMAND_HANDLER node info expected 9 got 2 <0x2b333680>
06 12/23/12 11:41:43.044 Device_Variable::m_szValue_set device: 3 service: urn:micasaverde-com:serviceId:SecuritySensor1 variable: Tripped was: 0 now: 1 #hooks: 1 upnp: 0 v:0xae6f28/NONE duplicate:0 <0x2b333680>
07 12/23/12 11:41:43.044 Event::Evaluate 1 motion_kitchen scene Intrusion is false repeat 0/-1 <0x2b333680>
06 12/23/12 11:41:43.045 Device_Variable::m_szValue_set device: 3 service: urn:micasaverde-com:serviceId:SecuritySensor1 variable: LastTrip was: 1356258944 now: 1356259303 #hooks: 0 upnp: 0 v:0xae7370/NONE duplicate:0 <0x2b333680>
01 12/23/12 11:41:51.323 ZWJob_PollNode::ReceivedFrame job job#99 :pollnode #9 (0x10a7b18) N:9 P:100 S:5 got FUNC_ID_APPLICATION_COMMAND_HANDLER node info expected 9 got 18 <0x2b333680>
06 12/23/12 11:41:51.324 Device_Variable::m_szValue_set device: 31 service: urn:micasaverde-com:serviceId:EnergyMetering1 variable: Watts was: 0 now: 0 #hooks: 0 upnp: 0 v:0xb21250/NONE duplicate:1 <0x2b333680>
06 12/23/12 11:41:51.324 Device_Variable::m_szValue_set device: 31 service: urn:micasaverde-com:serviceId:EnergyMetering1 variable: ActualUsage was: 1 now: 1 #hooks: 0 upnp: 0 v:0xb21fd0/NONE duplicate:1 <0x2b333680>
01 12/23/12 11:41:51.583 ZWJob_PollNode::ReceivedFrame job job#99 :pollnode #9 (0x10a7b18) N:9 P:100 S:5 got FUNC_ID_APPLICATION_COMMAND_HANDLER node info expected 9 got 23 <0x2b333680>
06 12/23/12 11:41:51.584 Device_Variable::m_szValue_set device: 110 service: urn:micasaverde-com:serviceId:ZWaveDevice1 variable: LastWakeup was: 1356257502 now: 1356259311 #hooks: 0 upnp: 0 v:(nil)/NONE duplicate:0 <0x2b333680>
02 12/23/12 11:41:51.584 ZWaveNode::Wakeup did a poll for 23 1809 seconds interval 10800 existing (nil) heal (nil) <0x2b333680>
02 12/23/12 11:41:51.585 ZWJob_SendData UPDATE MANUAL ROUTE 23=(nil) <0x2b333680>
01 12/23/12 11:41:54.263 ZWJob_PollNode::ReceivedFrame job job#99 :pollnode #9 (0x10a7b18) N:9 P:100 S:5 got FUNC_ID_APPLICATION_COMMAND_HANDLER node info expected 9 got 2 <0x2b333680>

44 now: 1356259358 #hooks: 0 upnp: 0 v:(nil)/NONE duplicate:0 <0x2ed60680>
02 12/23/12 11:42:49.424 ZWaveSerial::GetFrame 0x2b732ef8 timed out now 0 m_listGetFramePending 0 <0x2b733680>
02 12/23/12 11:42:49.424 ZWaveJobHandler::SendDataAbort got m_iFrameID 0 <0x2b733680>
01 12/23/12 11:42:49.425 ZWaveNode::CurrentRouteFailed node 17 device 30 position 8 route 38.34.0.0 hop -1 (from 0) <0x2b733680>
06 12/23/12 11:42:49.426 Device_Variable::m_szValue_set device: 30 service: urn:micasaverde-com:serviceId:ZWaveDevice1 variable: AutoRoute was: 43-38x,38.34-40,36.39-54,43.39-54 now: 43-38x,38.34-40x,36.39-54,43.39-54 #hooks: 0 upnp: 0 v:(nil)/NONE duplicate:0 <0x2b733680>
06 12/23/12 11:42:49.426 Device_Variable::m_szValue_set device: 1 service: urn:micasaverde-com:serviceId:ZWaveNetwork1 variable: LastRouteFailure was: 1356259257 now: 1356259369 #hooks: 0 upnp: 0 v:(nil)/NONE duplicate:0 <0x2b733680>
02 12/23/12 11:42:49.427 CurrentRouteFailed XXX1 (null) <0x2b733680>
02 12/23/12 11:42:49.427 CurrentRouteFailed XXX3 43-38x,38.34-40x,36.39-54,43.39-54 <0x2b733680>
02 12/23/12 11:42:49.427 CurrentRouteFailed XXX4 <0x2b733680>
02 12/23/12 11:42:49.428 UpdateManualRoute XXX1 node 17 route 43-38x,38.34-40x,36.39-54,43.39-54 <0x2b733680>
02 12/23/12 11:42:49.428 UpdateManualRoute XXX2 36.39.0.0 from 43-38x,38.34-40x,36.39-54,43.39-54 <0x2b733680>
02 12/23/12 11:42:49.428 XXXX CurrentRouteFailed node 17 m_cRoutePosition 18 route 36.39.0.0 <0x2b733680>
02 12/23/12 11:42:49.428 ZWJob_PollNode::ReturnMessageNotReceived -end- job job#101 :pollnode #17 (0x10a83b0) N:17 P:100 S:5 node 17 didn’t reply iOK 4 iBadTx 0 iNoReply 1 Abort returned m_iFrameID 0 <0x2b733680>
02 12/23/12 11:42:49.429 ZWJob_PollNode::PollFailed job job#101 :pollnode #17 (0x10a83b0) N:17 P:100 S:5 node 17 battery 0 <0x2b733680>
06 12/23/12 11:42:49.429 Device_Variable::m_szValue_set device: 1 service: urn:micasaverde-com:serviceId:ZWaveNetwork1 variable: LastError was: Poll failed now: Poll failed #hooks: 0 upnp: 0 v:0xc93c18/NONE duplicate:1 <0x2b733680>
04 12/23/12 11:42:49.430 <0x2b733680>
02 12/23/12 11:42:49.430 JobHandler::PurgeCompletedJobs purge job#101 :pollnode #17 (0x10a83b0) N:17 P:100 S:2 pollnode #17 3 cmds status 2 <0x2b733680>

02 12/23/12 11:45:23.008 ZWJob_SendData UPDATE MANUAL ROUTE 18=0xab5cff <0x2c533680>
06 12/23/12 11:45:23.009 Device_Variable::m_szValue_set device: 77 service: urn:futzle-com:serviceId:CountdownTimer1 variable: Event was: 1 now: 0 #hooks: 1 upnp: 0 v:0x8d74e8/NONE duplicate:0 <0x2c533680>
07 12/23/12 11:45:23.009 Event::Evaluate 12 LampeMaisonAV-OFF scene LampeMaisonAuto-OFF is false repeat 0/-1 <0x2c533680>
02 12/23/12 11:45:23.011 UPDATE MANUAL ROUTE2 18=0xab5cff <0x2b733680>
02 12/23/12 11:45:23.011 ZW_Send_Data node 18 USING ROUTE 38.35.0.0 <0x2b733680>
02 12/23/12 11:45:23.105 UPDATE MANUAL ROUTE2 18=0xab5cff <0x2b733680>
02 12/23/12 11:45:23.106 ZW_Send_Data node 18 USING ROUTE 38.35.0.0 <0x2b733680>
01 12/23/12 11:45:23.173 got CAN <0x2bb33680>
02 12/23/12 11:45:23.174 ZWaveSerial::Send m_iFrameID 906 got a CAN – Dongle is in a bad state. Wait 1 second before continuing to let it try to recover. <0x2b733680>
[/code]

Hello Thomasss,

The same question, sometimes my Vera crash and I’m wondering if it could be a solution to reinstall everything.

Another similar point, when the program crash, the system seems to still be alive, I mean, I can ping it. So is there a known and clean solution to restart the Vera program, as the system is always “alive”, something like a watchdog system, that (software) restarts the Vera when crashed.

You should not need to re-install everything.
Just go to a backup where things were working.

Thanks, that I’ve already done. The question is: it does not crash often, maybe once a month, but always when I’m away from home, so do you have a solution for this or should I open another post?

"
Another similar point, when the program crash, the system seems to still be alive, I mean, I can ping it. So is there a known and clean solution to restart the Vera program, as the system is always “alive”, something like a watchdog system, that (software) restarts the Vera when crashed.
"

There are a lot of moving parts in Vera. They all need to work in a normal setup.
When you say it crashes but still can be pinged … that tells me that some component stopped working. Without access to the log files it would not be possible to analyze.

If you add a USB memory stick for saving logs on your machine … it should hold a couple of days of information locally.

Normally log files are sent to the MCV server when they get to be a certain size. You can file a help ticket to have them look over the logs for the problem. You will need to be specific in terms of when the problem happened.

There are ways to save log files on your machine for you to look at. Requires some linux experience to edit some files on Vera … and ssh and scp to login to Vera and move files around.

Once again, thank you for your answer, but I think I’m not clear enough…

I’ve already contacted Vera support, no news.
I’ve already a usb stick plugged in the vera, great, but changes nothing.

As Vera is in charge of a lot of security stuffs and climatisation and other pumps of the whole house, I need a reliable system to do that. This is not the case now with Vera…

I’m just asking if there is a way, in such a situation, when the Vera Software is not responding in http and with cmh, but the PING is responding, TO REBOOT the Vera system automatically?

Thanks, chris

Hello Chris,

I would divide Vera into three tiers, the hardware (physical), Mios engine and User interface (UI).
Under this simple scenario, I have a car, the engine is running but I can’t get in… this example, even though you may not access the car, everything else is functional. Thus, for the Vera, you would still be able to ping the Vera even though you do not have access via the Internet.

So, when you state crashes or “when the Vera Software is not responding in http and with cmh”, are you referring to not being able to access the UI or they system fails? A simple test maybe an http call to execute a scene or change a device state and establish if the underlying device is functioning. If you need some pointers have look here http://wiki.micasaverde.com/index.php/Luup_Requests#actions.
Or, there is also a logical test lu_alive http://Vera_ip:3480/data_request?id=lu_alive but the UI is not functioning see http://wiki.micasaverde.com/index.php/UI_Simple

If the underlying actions are available, you could setup a scene to execute os.execute( ‘reboot’ )
See http://wiki.micasaverde.com/index.php/Restart (this could also be setup as a cyclic scene)

… Not the answer but I hope this give you something to work from.

Thank you Brientim for this good explanation.

"
So, when you state crashes or “when the Vera Software is not responding in http and with cmh”, are you referring to not being able to access the UI or they system fails?
"
Yes, unable to access to the UI via web browser or other application AutHomation with android. I can see for example that scenes are no more running. For example lights are still on after 11PM, and scene “switch off lights 10:45PM” is not running. The VeraLite lights are still ON.

I’ve seen in another old 2011 post that a new firmware was supposed to handle that kind of “watchdog” problem. Did you hear about that?

Anyways thanks for the answer.

chris

The only thing came to mind was someone have two unit one watching the other and being connected to a power switch. When the ping failed, the other units power would be switched off and then restarted after x minutes.

However, if the problem is reoccurring, the underlying problem needs to be identified and addressed otherwise you will never have reliability.

I think the underlying problem is the LAN RFXCOM plugin, still waiting for a stable version :wink: The last time I asked Vera Support, they told me that this plugin has generated an error. I like very much open systems, but the problem is that the “main” program (Micasa Vera code) should handle errors generated by the third party plugins. No?

Richard, what can I say to this?

I use a fitpc2 with Microsoft XP as a DVR for my IP Cams, as Vera is still unable to do that… This PC is working 24h a day and NEVER crash. And with the Securtity suite provided by Microsoft, I’ve never seen a virus corupting my PC… The Core application of Vera software should be better protected, and I would like to leave my house for vacations without to be afraid that the Vera will freeze. That’s my opinion.

Vera is no t a do all system. A dedicated dvr for cameras will always perform better, same for a dedicated alarm system. I use my Vera for automation of lights, locks, etc and nothing else. If I want to record video, I would use a product just for that.

As for preventing errors from third party developers. There is only so much that can be done. It is almost impossible to try and prevent shoddy code. It is up to the developer to fix there bugs and write better code. No developer is perfect and bugs will happen.

  • Garrett

Thanks Garrett, but I’m sure you know Murphy’s law? When I stay at home for one month, it works perfectly but almost each time I leave the house for a couple of days and try to connect to the system, it is crashed, so I try to find a solution to prevent this, that’s all. :cry: