Workflow timer cycling too quickly

I have 3-state workflow, cycling round to run my boiler config every minute, and then check it has run. I have noticed several strange things about it - initially that I had some log entries that the timer was being ignored, not a major concern in itself but nothing else can change the state.

luup_log:128: ALTUI: warning: Wkflow - Timer - Timer ignored, active state is different from the timer state <0x756eb520>

I then noticed that the timers are cycling round twice as fast as they should so that my boiler update is called two times every minute, and in the “history” some get labelled with just “name”, some as “timer:name”. I have replaced the timer transitions with identical 20s timers (clones), but get the same problem.
Anyone else seen this effect?
(The workflow cycles round the 3 states, checking for an error, waits for 20s and sets a virtual switch as a heartbeat monitor, waits 20s, runs the boiler update… clearing the switch, then waits another 20s and repeats the process. [If the switch has not been cleared then an error is flagged, but that looks ok] However, some of the timers are firing immediately rather than waiting 20s, doubling the overall rate)


Cheers
Octo

can you share the detailed workflow report too

also would be good to capture an execution in DEBUG mode and share the Luup logs

Report attached below:

Boiler Refresh 0-4 - (0-4)

  • 5 States: Start, Check(boiler), Update Boiler, Boiler Error, Set Heartbeat
  • 6 Transitions: 30s, Heartbeat Stuck, 20s, Clone of 20s, C2 20s, 30s

Details

  • Start - 0e4445ab-695e-41f1-b2ae-29714516f34c
    • Transitions
      • ‘30s’ - fdc966d1-9cc6-42f6-9584-0875dc688eac
  • When
    * Timer: ‘30s’ Duration: 30 seconds * Moves To
    * State: Check(boiler)
  • Check(boiler) - 1fdeb540-9a6c-49e3-9d61-f0bd192172cb
    • warn if boiler scene call failed (eg nil value)
    • OnEnter
      • Lua
        luup.sleep(1) – trail just to see if doing anything fixes loop speed
    • Transitions
      • ‘Heartbeat Stuck’ - 363d2c51-4aa7-4c33-b4e2-25dcdc2dea6e
  • When
    * device:’ Boiler HeartBeat ’ (0-366) when urn:upnp-org:serviceId:SwitchPower1-Target
    tonumber(new) == 1 * Moves To
    * State: Boiler Error
    * ‘20s’ - c1883faf-0e3f-4aa9-9a5d-2e32486c00f0
  • When
    * Timer: ‘20s’ Duration: 20 seconds * Moves To
    * State: Set Heartbeat
  • Update Boiler - 8ec1d53a-7a03-4965-ba40-333b82db60fd
    • OnEnter
      • run BoilerControl (0-74)
    • Transitions
      • ‘Clone of 20s’ - 2e6d9eb1-52e1-4262-9419-42cadd640e8d
  • When
    * Timer: ‘20s’ Duration: 20 seconds * Moves To
    * State: Check(boiler)
  • Boiler Error - f93b7a9f-1233-4cf0-a65d-c27557c6706b
    • Set Heartbeat to avoid repeating the warning
    • OnEnter
      • on Boiler HeartBeat (0-366) urn:upnp-org:serviceId:SwitchPower1-SetTarget [{“name”:“newTargetValue”,“value”:“1”}]
      • run Boiler Error (0-214)
      • Lua
        –luup.log(“Boiler error state”)
    • Transitions
      • ‘30s’ - 1597f8ec-b91e-46de-b309-15d7685a3a82
  • When
    * Timer: ‘30s’ Duration: 30 seconds * Moves To
    * State: Update Boiler
  • Set Heartbeat - fac6189c-9a4e-4f80-8537-4bf987288c35
    • Set heartbeat,
    • OnEnter
      • on Boiler HeartBeat (0-366) urn:upnp-org:serviceId:SwitchPower1-SetTarget [{“name”:“newTargetValue”,“value”:“1”}]
    • Transitions
      • ‘C2 20s’ - 5f23cf63-e66c-4ac5-9388-fb49f610f839
  • When
    * Timer: ‘20s’ Duration: 20 seconds * Moves To
    * State: Update Boiler

And a 2 minute log, grepped with “Altui” is attached. (There may be other bits in there, as I have a number of other workflows.)
HTH
Octo.

! edit - trying again with debug log and a fresh history at the corresponding time. I have had a look through L_altui, it feels like the timer expired property is getting cleared, there are no other valid transitions, and so the armtimer control is not getting called for the next state, but I cannot work out why :frowning:
Happy to try out any suggestions.
Altui log.zip (15.7 KB)

not sure yet but first comment, I would never use the variable target for a trigger.
you have to use Status , devices have a latency between Target and Status. the action SetTarget sets a new target but the device is only in the new state when the variable Status is set

also you may want to try the investigate the “only when triggered” option of the transition

also , I would definitely get rid of the Luup.sleep()

I am almost certain that is the root of the issue, it is probably a design limitation but you basically set a variable in an action, and you have a watch on that variable in a transition of the same workflow, so as soon as you set it , it executes the other transaction based on that trigger and we have a re-entrancy issue most likely

short answer; set target but trigger based on status

Ummm, yes, but I only put those bits in to try and stop the effect…

  • the sleep was added as a trial in case the state needed something to do,

  • and I did have the error test on the Status, but changed it because the previous state (where the target was set) is exited without the requested delay, so I wanted to get rid of the latency on the switch :slight_smile:

I have changed the test back to “status”, but still get the same effect, it always ignores one timer, always obeys one, and alternates on the third - weird !

The error transition is not getting triggered, only the timers are running, so going to “only when triggered” should not make any difference. (just tried and indeed it does not.)

(Note: in the original incarnation, I had two main states, and set the flag on EXIT of the Check state, but that did gave a race condition (even on “Status”) where sometimes that would be detected as the error trigger, hence I added the third state, so that the flag is not set and tested in the same state)

so the best is with status version , send me another workflow report and put ALTUI in DEBUG mode and then in the UI click reset on the workflow ( note down the time so we can correlate with the logs )
reproduce your pb and capture the log, please share the log.

I don’t use workflow so forgive my ignorance here but this problem reminds me of something I had a couple of years ago with a couple of plugins when I transitioned to openLuup. So @Octoplayer, Are you running this on a vera or openLuup?
What I had noticed was that some plugins, designed for the vera, were working because of how slow the vera was. Namely variable watches would work because a triggering variable would change and change back slow enough for the watch to catch it. On openLuup, depending on the hardware you run it on, it is very possible that the variable would change back before the watch would detect it. @futzle’s countdown timer is an example of such a problem. I ended up adding a luup.call_delay in the plugin to force the variable to remain unchanged for 1s. Again, I may be completely off base here but just in case this could be useful…

Thanks Rafale, this is iindeed running on Vera. The variable test is not racing, it is not getting triggered, and has (in theory) 20s to change (and is a Virtual switch, so only takes microsecs to change). The basic machine is going round 3 states sequentially, and should just wait 20s between each transition, but regularly misses the wait and just jumps straight to the next state.
I have just been playing with a 4th state, to force a delay, similar thing happens, it goes round all 4 states once, then does 2 jumps with no delay, then 2 with delay. It looks as though the code has an internal race between expiring the timers and enabling the next timer, but I cannot spot where that is happening.
Okay - I moved the flag setting to the On Exit stage; no difference, deleted the flag watch; no difference, and then took out the Error State entirely. Defo a race in the timer handler, I can see TWO instances of the timers in the history, one saying “Timer:C 20s”, and one “C20s”, both moving from the Update State. Then “C2 20s” and “Timer:C2 20s” run immediately.

Got to go out: BBL to upload files

can you please try to avoid delete all transition with timers and save / reset workflow
then reload luup and then recreate timer transition one by one without using any clone feature just to be sure.

we also need to see your Lua scenes content and make sure your watch is only based on Status variable, not Target

Hi AMG,
So above I had deleted the watch entirely, and then the error scene, and was still getting the skipping each time. (I was saving and reloading luup after each change).
I had also deleted and recreated the timer transitions, reduced to 10s to speed up testing, still getting the skipping. I deleted the scene call that runs the boiler and clears the heartbeat - still skipping,and changed the switch from going on to going off - no difference.
I then deleted the action to set the virtual switch (which was not then actually changing the state) and the skipping stopped! I have reinstated the call but the timers are now running correctly. It looks like the presence of an action in one state was causing timers to be ignored (or actioned twice) in two states - maybe something corrupted, although the action itself was working fine??
I have now stripped the workflow back down to two states which just call the update scene and will leave that running for a few days to confirm it is working.

The Report is attached below, showing there were no watches, and I can PM you the corresponding log file if that would still be of interest?

Thanks
Octo

Boiler Refresh 0-4 - (0-4)
5 States: Start, Check(boiler), Update Boiler, Set Heartbeat, New
5 Transitions: 30s, c3 20s, 20s, C 20s, C2 20s
Details
Start - 0e4445ab-695e-41f1-b2ae-29714516f34c
Transitions
‘30s’ - fdc966d1-9cc6-42f6-9584-0875dc688eac
When
Timer: ‘30s’ Duration: 30 seconds
Moves To
State: Check(boiler)
Check(boiler) - 1fdeb540-9a6c-49e3-9d61-f0bd192172cb
warn if boiler scene call failed (eg nil value)
Transitions
‘20s’ - c1883faf-0e3f-4aa9-9a5d-2e32486c00f0
When
Timer: ‘20s’ Duration: 10 seconds
Moves To
State: Set Heartbeat
Update Boiler - 8ec1d53a-7a03-4965-ba40-333b82db60fd
OnEnter
run BoilerControl (0-74)
Transitions
‘C2 20s’ - 5f23cf63-e66c-4ac5-9388-fb49f610f839
When
Timer: ‘20s’ Duration: 10 seconds
Moves To
State: New
Set Heartbeat - fac6189c-9a4e-4f80-8537-4bf987288c35
Set heartbeat
OnExit
on Boiler HeartBeat (0-366) urn:upnp-org:serviceId:SwitchPower1-SetTarget [{“name”:“newTargetValue”,“value”:“1”}]
Transitions
‘C 20s’ - d5eeb715-6518-4799-a611-7fcd48af14fb
When
Timer: ‘20s’ Duration: 10 seconds
Moves To
State: Update Boiler
New - 9477dc4a-1198-44f3-848d-fa6ff2672d39
Transitions
‘c3 20s’ - af605789-8856-4e08-a1ed-2625dbbf94de
When
Timer: ‘20s’ Duration: 10 seconds
Moves To
State: Check(boiler)

Hi AMG,
I have been investigating this issue I saw a while ago with timers being ignored on my workflows. I did a work-around on the original WF, but couple of days ago I noticed it on another one, so started digging again, and got a lot further.
I have a state with an action which turns on a device, then SHOULD wait and test that it succeeded (I want to be 100% sure the switch has turned on/off, or it may drain my batteries)
However, the wait is ignored.
What seems to be happening is that, in executeStateLua: any old timers are expired, the lua, scenes and actions are run, then the new transitions and timers are enabled.
But… the device switches on BEFORE the Arm Transitions function has run, I have a watch set on it so EvaluateTransitions is run by the callback.
Before that checks if there are any actual expressions currently associated with the condition change (which there aren’t), it checks if there is a timer.
There is, but it has not yet been armed, so the code sees it as expired, thinks it has run its course, and so calls executeStateLua to transition straight to the next state.

One option would be to move the the “TimerExpired” check to after the expression evaluation, but cannot get my head around whether that would work.
I have just tried putting in a “Disable” flag in the Execute module, so Evaluate does not check the timers while the actions are being run, then re-enabling once the ArmTransitions part is complete. This seems to be working, but I will test it a bit more and post the code.
Any thoughts on if this makes sense / is correct, or how better to fix ?

Cheers
Octo

Hi AMG,
I have been running my modded code for a month now, and not seen any race conditions developing, so here are the changes I made, one line to check the flag, a couple to enable / disable it and a couple of debug changes to monitor the problem:

L_ALTUI.zip (41.0 KB)

Regards
Octo

local timers_disabled = false – octo - flag to stop interrupts from clearing timer

– otherwise check if timer expired
if (link.prop.timer ~= “”) and timers_disabled == false then – octo, ignore timer expiry if in the middle of enabling them
debug(string.format(“Wkflow - link has a timer.”))
local res = link.prop.expired
link.prop.expired = false
if (res==true) then
debug(string.format(“Wkflow - link had expired.”)) – octo
return true
end
end

– execute onExit of old state
timers_disabled = true – octo - stop timers getting cleared as they are currently expired

– execute onEnter of new state
executeStateLua(lul_device,workflow_idx,newstate,“onEnterLua”)
executeStateActions(lul_device,workflow_idx,newstate,“onEnter”)
executeStateScenes(lul_device,workflow_idx,newstate,“onEnterScenes”)
armLinkTransitions(lul_device,workflow_idx,newstate)
timers_disabled = false --octo - timers are now running

debug(string.format("Wkflow - blocked:%s, paused:%s", tostring(Workflows[workflow_idx].blocked or false), tostring(Workflows[workflow_idx].paused or false))) --octo

		debug(string.format(strWorkflowTransitionTemplate, Workflows[workflow_idx].altuiid, link.labels[1].attrs.text.text, oldstate.attrs.text.text, targetstate.attrs.text.text)); --Octo
		nextWorkflowState( lul_device, workflow_idx, oldstate, targetstate , getCellName(link))

we have to give some thoughts, I wonder if what you propose has the risk to make some other workloads lose a timer while the timers are being disabled, we may need it but by individual workflows

Hi AMG,
Yes I agree, A single global flag is not a good in an async multithread system, It started as a quick fix to see if my theory about the reason for skipping states was correct. I haven’t noticed any issues, but may have been lucky or just missed the issue,

An array should be a simple, safe solution,although there may be a better way to fix the problem :slight_smile:

THanks
Octo

Hi AMG,
I have finally got around to putting the flag into an array, still seems to be working. The updated code is attached here:
L_ALTUI.zip (41.0 KB)

Thanks
Kevin

Thank you, I have integrated it in a official V2.50 on GitHub only for now