timed http request crashes

Hi,

I created a new implementation file for a Custom Dimmable Light device (controlled using web service requests).

I refresh the status of this device every 5 seconds by making http.request calls and getting its status (using luup.calltimer) and I can set its status using another web server request in a SetLoadLevelTarget Job (as per the pseudo code below).

Everything works great until a Job is executed at the same time when the refresh function is called.
Both http calls fail crashing luup at times with the below errors.

[tt]
01 10/14/11 19:17:01.225 LuaInterface::CallFunction-5 function refreshStatus failed /usr/lib/lua/socket/http.lua:145: attempt to call field ‘try’ (a string value) <0x402>
.
.
.

01 10/14/11 19:17:02.141 LuaInterface::CallFunction-1 device 60 function SDimming1_Dimming1_SetLoadLevelTarget_job failed /usr/lib/lua/socket/http.lua:150: attempt to call field ‘try’ (a string value) <0x400>
01 10/14/11 19:17:02.141 Lua_Job::Run job#27 : (0x9f28a0) P:50 S:0 failed <0x400>
[/tt]

It seems that LuaSocket does not implement any multithreading and a function called using luup.calltimer is not serialized or put in a queue.

Is there any way to create a timed function that runs within a user created job so both http calls are not executed at the exact same time ?

pseudo-code: (I_CustomDimmableLight1.xml)
[tt]

	local http=require('socket.http')

	function refreshStatus()
		-- Calling the function in 5 seconds
		luup.call_timer("refreshStatus", 1, "5", "", "")
		
		-- Initiate a web request to refresh the device status
		socket.http.request{...}

	end

</functions>
<startup>Startup</startup>
<actionList>
	<action>
		<serviceId>urn:upnp-org:serviceId:Dimming1</serviceId>
		<name>SetLoadLevelTarget</name>
		<job>
			-- Initiate a web request to set the device status
			socket.http.request{...}

		</job>

	</action>
</actionList>

[/tt]

Roy

Try call_delay. It has a parameter thread:

Thank you florin for the fast reply,

I didn’t notice this parameter on call_delay before.

I replaced call_timer with [tt]luup.call_delay(“refreshStatus”, 5, “”, 1)[/tt] but unfortunately I’m still getting the same errors :frowning:

Can I have Vera poll the device status every 5 seconds (I implemented the urn:micasaverde-com:serviceId:HaDevice1 Poll action and having it call the web service) without using timers ?

thanks,
Roy.

Try to implement mutual exclusion using a lock:

<functions>

      local http=require('socket.http')
      lock = 0

      function refreshStatus()      
         -- Wait until the other job releases the lock.
         while (lock == 1) do end
         
         lock = 1 -- Acquire the lock
             socket.http.request{...}
         lock = 0 -- Release the lock

         -- Calling the function in 5 seconds
         luup.call_delay ("refreshStatus", 5, "", 1)

      end
...............

   </functions>
   <startup>Startup</startup>
    <actionList>
      <action>
         <serviceId>urn:upnp-org:serviceId:Dimming1</serviceId>
         <name>SetLoadLevelTarget</name>
         <job>
            -- Wait until the other job releases the lock.
            while (lock == 1) do end

            lock = 1 -- Acquire the lock.
                socket.http.request{...}
            lock = 0 -- Release the lock.
         </job>
      </action>
    </actionList>

That wont work either. Lua is single threaded, but MiOS is making concurrent calls to it across threads. The more frequent these calls, the.more likely a timer & IO collision will occur.

@MCV was working on some changes in this space to avoid the issue, since it cannot be avoided from within lua code itself.

@guessed is right. I had tried this without any luck.
I also tried to implement the below logic (Calling the Poll action from the refresh function); but this time, luup crashed with each call_action call to Poll ???

luup.call_action returned weird values (null for lul_resultcode, table for lul_job)!!


<functions>

      local http=require('socket.http')

      function refreshStatus()
         -- Calling the function in 5 seconds
         luup.call_timer("refreshStatus", 1, "5", "", "")
         
	local lul_arguments = {}
	local lul_resultcode, lul_resultstring, lul_job, lul_returnarguments
	lul_resultcode, lul_resultstring, lul_job, lul_returnarguments = luup.call_action("urn:micasaverde-com:serviceId:HaDevice1", "Poll", lul_arguments, tonumber(lul_device))

      end

......
   </functions>
   <startup>Startup</startup>
    <actionList>
.
.
.
.

      <action>
         <serviceId>urn:micasaverde-com:serviceId:HaDevice1</serviceId>
         <name>Poll</name>
         <job>
       
               socket.http.request{...}
         </job>
      </action>
    </actionList>

Try using coroutines. Hopefully this will work.

<functions>
    local http = require ('socket.http')

    function refreshStatus (args)
        local co = coroutine.create (http.request)
        coroutine.resume (args)

        -- Call the function in 5 seconds.
        luup.call_delay ("refreshStatus", 5, args, 1)
    end
    ...............
</functions>
<startup>Startup</startup>
<actionList>
    <action>
        <serviceId>urn:upnp-org:serviceId:Dimming1</serviceId>
        <name>SetLoadLevelTarget</name>
        <job>
            refreshStatus (args)
        </job>
    </action>
</actionList>

Do you return anything at the end of the function?

<actionList>
    <action>
        <serviceId>urn:upnp-org:serviceId:Dimming1</serviceId>
        <name>SetLoadLevelTarget</name>
        <job>
            refreshStatus (args)
            return 4, 10
        </job>
    </action>
</actionList>

Sorry, I just noticed that I modified your post instead of hitting Reply. Must be the lack of sleep. Does anybody know if there is a revert function for forum posts? ;D

For future references (And in case no revert function exist): I replied that coroutines seemed to improve things a lot as http.request are now all successful but I’m still getting luup crashes without any apparent cause showing in the log file when calling the SetLoadLevelTarget action repeatedly.

A few memory leaks do show up in the log file and I tried to resolve that by calling collectgarbage(“collect”) in the refreshStatus() function but that didn’t seem to solve it.

As for your question florin, yes i’m returning 4 in case of success and 2 in case of failure (returning only one value).

If it’s not a problem, can you post the logs again?

Thanks.

Sure. These are the last entries before the crash:

[code]06 10/17/11 18:22:53.253 Device_Variable::m_szValue_set device: 60 service: urn:upnp-org:serviceId:SwitchPower1 variable: Status was: 1 now: 1 #hooks: 0 upnp: 0 v:0x864a30/NONE duplicate:1 <0x11444>
01 10/17/11 18:22:53.254 luup_log:60: I_BeckhoffDimLight1: Coroutine successful <0x11444>
06 10/17/11 18:22:56.237 Device_Variable::m_szValue_set device: 59 service: urn:upnp-org:serviceId:SwitchPower1 variable: Status was: 1 now: 1 #hooks: 0 upnp: 0 v:0x864a30/NONE duplicate:1 <0x11845>
01 10/17/11 18:22:58.247 luup_log:60: L_BeckFunc: Level read request received with response code: 1 and c= 200 <0x11c46>
06 10/17/11 18:22:58.252 Device_Variable::m_szValue_set device: 60 service: urn:upnp-org:serviceId:Dimming1 variable: LoadLevelStatus was: 70 now: 70 #hooks: 0 upnp: 0 v:0x8a25d0/NONE duplicate:1 <0x11c46>
06 10/17/11 18:22:58.253 Device_Variable::m_szValue_set device: 60 service: urn:upnp-org:serviceId:SwitchPower1 variable: Status was: 1 now: 1 #hooks: 0 upnp: 0 v:0x864a30/NONE duplicate:1 <0x11c46>
01 10/17/11 18:22:58.254 luup_log:60: I_BeckhoffDimLight1: Coroutine successful <0x11c46>
06 10/17/11 18:23:01.234 Device_Variable::m_szValue_set device: 59 service: urn:upnp-org:serviceId:SwitchPower1 variable: Status was: 1 now: 1 #hooks: 0 upnp: 0 v:0x864a30/NONE duplicate:1 <0x12010>
01 10/17/11 18:23:03.236 luup_log:60: L_BeckFunc: Level read request received with response code: 1 and c= 200 <0x12447>
06 10/17/11 18:23:03.241 Device_Variable::m_szValue_set device: 60 service: urn:upnp-org:serviceId:Dimming1 variable: LoadLevelStatus was: 70 now: 70 #hooks: 0 upnp: 0 v:0x8a25d0/NONE duplicate:1 <0x12447>
06 10/17/11 18:23:03.242 Device_Variable::m_szValue_set device: 60 service: urn:upnp-org:serviceId:SwitchPower1 variable: Status was: 1 now: 1 #hooks: 0 upnp: 0 v:0x864a30/NONE duplicate:1 <0x12447>
01 10/17/11 18:23:03.243 luup_log:60: I_BeckhoffDimLight1: Coroutine successful

2011-10-17 18:23:18 - LuaUPnP Terminated with Exit Code: 0

2011-10-17 18:23:18 - LuaUPnP crash

01 2011-10-17 10:5:34 caught signal 11 <0x400>
01 2011-10-17 10:6:2 caught signal 11 <0x400>
01 2011-10-17 10:6:30 caught signal 11 <0x400>
01 2011-10-17 11:4:12 caught signal 11 <0xb1aa8>
01 2011-10-17 11:51:7 caught signal 11 <0xb82c>
01 2011-10-17 12:48:51 caught signal 5 <0xadea8>
01 2011-10-17 12:48:51 caught signal 4 <0xae2a9>
01 2011-10-17 13:11:56 caught signal 11 <0x1344a>
01 2011-10-17 14:7:38 caught signal 4 <0xb0ea8>
01 2011-10-17 15:12:18 caught signal 4 <0xb0ea8>
01 2011-10-17 15:12:18 caught signal 11 <0x9f2128>
01 2011-10-17 16:35:43 caught signal 4 <0xb2aa8>
01 2011-10-17 17:51:16 caught signal 4 <0xb2aa8>
01 2011-10-17 18:14:37 caught signal 11 <0x400>
01 2011-10-17 18:16:1 caught signal 11 <0xb02a>
01 2011-10-17 18:20:24 caught signal 11 <0x400>
---------------exited-------------
[/code]

I suppose that either Verbose Logging wasn’t enabled or the messages have been filtered with grep ‘^01|^06’ . I can’t tell why it crashed from these logs.

Can you send me a PM with your Vera number?

After many days of debugging, I think I managed to find out the cause of the problem.

Calling luup.variable_set within a timed function (whether using call_timer or call_delay) is creating a memory leak (tested over many days and confirmed this morning).
Setting at the same time the same variable when using a call_action crashes LuaUpnP.

I need to update the device status at least every 5 secs, and I don’t see any other solution besides using call_delay or call_timer.

Is there any way to force vera to call the “poll” action as it is done with z-wave devices ?

Finally I found a workaround for the crashes:

I removed the timed functions from the plugin and used instead the startup Lua script to call the Poll action where I refresh the device’s status.
It’s not the elegant solution I had in mind as I have to edit the startup script every time I add a new device but at least it works!

[code]function refreshCache()
luup.call_delay(“refreshCache”, 5, “”, 1)
luup.call_action(“urn:micasaverde-com:serviceId:HaDevice1”, “Poll”, “”,59)
luup.call_action(“urn:micasaverde-com:serviceId:HaDevice1”, “Poll”, “”,60)
luup.call_action(“urn:micasaverde-com:serviceId:HaDevice1”, “Poll”, “”,61)

end

luup.call_delay(“refreshCache”, 5, “”, 1)[/code]

I’m glad that you found a workaround for this. I would recommend putting the luup.call_delay function after the luup.call_action functions, because polling takes some time and there is the risk that one refreshCache instance will overlap over the other refreshCache instance if polling takes 5 seconds or more.

I added bugs #1939 and #1940 to the Mantis with the problems you reported.

Thanks for the tip florin,

I had put luup.call_delay at the start of the function just in case one of later statements crashes but it I don’t think there’s such a risk in this case.

Cheers,
Roy.

@Roy S,
If you want to guard your code from failures, you can use [tt]pcall[/tt] to wrap the bits that might fail in odd ways:
Lua 5.1 Reference Manual

Good one @guessed!

I didn’t know about that. Nice alternative for the missing (try…catch).

I also thought about adding an extra 10 seconds to the first call_delay call, just in case the devices r not properly initialized when the startup code runs.