Any multithreading anywhere in Vera?

I’m very curious to know if there is any multithreading going on in the Vera, and if so, how thread safety is implemented.

There is a user with mysterious failures, like a TCP/IP socket being used by the Nest plugin once in awhile mysteriously returning an error as the socket having been closed, while another LuaSocket-using plugin was also running. For example: I doubt that LuaSocket’s C libraries are threadsafe, but I don’t know that for sure. If more than one Lua interpreter were running in the same process, and underlying C libraries were not threadsafe, then different interpreter threads that used LuaSocket would eventually step on each other, with very hard-to-diagnose failures resulting.

So, is there any multithreading going on in Vera?

Thanks for any input!

watou

I suspect the simple answer to your question is no! Have a look at this thread:

http://forum.micasaverde.com/index.php/topic,13796.msg105342.html#msg105342

The only solution I’ve found is to schedule comms in my plug-ins so they don’t interfere with one another. Nothing like ideal, but it works.

[quote=“racarter, post:2, topic:175088”]I suspect the simple answer to your question is no! Have a look at this thread:

http://forum.micasaverde.com/index.php/topic,13796.msg105342.html#msg105342

The only solution I’ve found is to schedule comms in my plug-ins so they don’t interfere with one another. Nothing like ideal, but it works.[/quote]

Thanks for the link. That was the mention @RichardTSchaefer made regarding multithreading that I had forgotten about. But that suggests that the answer to my question is “yes” – that Richard thinks there is multithreading going on.

And while a plugin author can avoid clobbering his own plugin by avoiding simultaneous socket activity (and the Nest plugin won’t clobber itself if there is multithreading going on), it doesn’t help if some other plugin is doing socket I/O at the same time in another thread, when both plugins are using underlying libraries that are not thread-safe.

Please let us know what is learned regarding the ticket you raised with MCV. If this hypothesis is correct, it would explain an entire class of mysterious failures.

watou

Vera creates one [tt]LuaState[/tt] per Plugin. Access to each [tt]LuaState[/tt] is controlled via at least one lock, to avoid the concurrency problems on the [tt]LuaState[/tt] context itself.

As others have mentioned, each [tt]LuaState[/tt] is not threadsafe, BUT Lua totally supports having multiple, isolated, ones running within the same process as long as they each have separate execution contexts.

Access to the [tt]LuaState[/tt] might occur when you’re running an [tt]ACTION[/tt] handler, a Timer callback, or an [tt][/tt] section (for example).

In UI4 there were, from memory, 2 threads “globally” in a pool for all things running on Vera. This led to a multitude of deadlock like situations for UI4 users with a lot of plugins (for a variety of reasons)… it also led to a lack of locking, in some situations (see below)

In UI5, there are supposed to be 2 thread per Plugin, so that one Plugin cannot [as] readily jack-up another.

I don’t use [tt]luup.inet.wget[/tt] for anything, since it’s invoking more MCV code. I do have a multitude of things [frequently] running LuaSocket calls, and they’re not giving me any noticeable problems.

eg. 3x Sonos Plugins, calling every 15 seconds, 1x Weather Plugin calling every 30 mins

I’d be extremely surprised if LuaSocket has a t-safe issue. It’s fairly hard to get it wrong when you have your own context to store stuff in.

UI4 users are basically hosed, since there were a load of situations where the locking wasn’t in place (adding it in UI5 is why MCV needed to add more threads)

As a result, when concurrent “events” occurred in UI4 (see above for examples) it had a high likelihood of corrupting the [tt]LuaState[/tt] object. When this happened, all sorts of ugly occurred, and often a restart of the LuaUPnP process was required.

Presumably your user is not on UI4…

Disclaimer: For full disclosure, every Vera version after 1.5.408 has caused my Vera 3 to “spontaneously” reboot when running one of my heavier scenes, or at other random times throughout the day (1-10x per day). AFAICT, this appears to be more of a problem with the [tt]NetworkMonitor [/tt]itself returning false positives, and excuting OS-level reboots.

[quote=“guessed, post:4, topic:175088”]Vera creates one [tt]LuaState[/tt] per Plugin. Access to each [tt]LuaState[/tt] is controlled via at least one lock, to avoid the concurrency problems on the [tt]LuaState[/tt] context itself.

As others have mentioned, each [tt]LuaState[/tt] is not threadsafe, BUT Lua totally supports having multiple, isolated, ones running within the same process as long as they each have separate execution contexts.

Access to the [tt]LuaState[/tt] might occur when you’re running an [tt]ACTION[/tt] handler, a Timer callback, or an [tt][/tt] section (for example).

In UI4 there were, from memory, 2 threads “globally” in a pool for all things running on Vera. This led to a multitude of deadlock like situations for UI4 users with a lot of plugins (for a variety of reasons)… it also led to a lack of locking, in some situations (see below)

In UI5, there are supposed to be 2 thread per Plugin, so that one Plugin cannot [as] readily jack-up another.

I don’t use [tt]luup.inet.wget[/tt] for anything, since it’s invoking more MCV code. I do have a multitude of things [frequently] running LuaSocket calls, and they’re not giving me any noticeable problems.

eg. 3x Sonos Plugins, calling every 15 seconds, 1x Weather Plugin calling every 30 mins

I’d be extremely surprised if LuaSocket has a t-safe issue. It’s fairly hard to get it wrong when you have your own context to store stuff in.

UI4 users are basically hosed, since there were a load of situations where the locking wasn’t in place (adding it in UI5 is why MCV needed to add more threads)

As a result, when concurrent “events” occurred in UI4 (see above for examples) it had a high likelihood of corrupting the [tt]LuaState[/tt] object. When this happened, all sorts of ugly occurred, and often a restart of the LuaUPnP process was required.

Presumably your user is not on UI4…

Disclaimer: For full disclosure, every Vera version after 1.5.408 has caused my Vera 3 to “spontaneously” reboot when running one of my heavier scenes, or at other random times throughout the day (1-10x per day). AFAICT, this appears to be more of a problem with the [tt]NetworkMonitor [/tt]itself returning false positives, and excuting OS-level reboots.[/quote]

Thank you very much for that detailed reply! Very helpful.

But I would say that, even if Lua C libraries are smart enough to keep their own data in the current context where they should, they could still very easily rely on lower-level code that uses data in non-threadsafe ways, and the original library author–who quite possibly never saw threaded code in his life–would never have considered or tested for the case where the whole stack is being used in multiple threads in the same process.

(I haven’t looked at the LuaSocket C code, and of course I have no knowledge of how MCV builds it or the interpreter.)

watou

[quote=“watou, post:5, topic:175088”]But I would say that, even if Lua C libraries are smart enough to keep their own data in the current context where they should, they could still very easily rely on lower-level code that uses data in non-threadsafe ways, and the original library author–who quite possibly never saw threaded code in his life–would never have considered or tested for the case where the whole stack is being used in multiple threads in the same process.

(I haven’t looked at the LuaSocket C code, and of course I have no knowledge of how MCV builds it or the interpreter.)[/quote]

Maybe, but who’s code are you going to believe more, the code that’s been around for ~10yrs, and is widely eyeball’d, or the code that has a short history, few eyes on it, and has proven threading problems in it’s recent past (eg. UI4)

… you can always inspect the implementation of LuaSocket, for statics etc, since it’s code is widely available.

[quote=“guessed, post:6, topic:175088”]Maybe, but who’s code are you going to believe more, the code that’s been around for ~10yrs, and is widely eyeball’d, or the code that has a short history, few eyes on it, and has proven threading problems in it’s recent past (eg. UI4)

… you can always inspect the implementation of LuaSocket, for statics etc, since it’s code is widely available.[/quote]

I’d rather not trust any code at all! :slight_smile: Seriously, I could inspect the LuaSocket C code every which way, but if MCV built it with the wrong library or compiler flags, or if the C code in turn used libraries that used no, or a different, threading support than what is used at higher levels, then there could easily be static data that is not visible to either you or me. So the Lua context will still not save you from that. And I have no reason to believe that the LuaSocket author made any provision or did any testing for how and where his code would run multithreaded.

Granted, this is all very theoretical, and may have nothing to do with the issue at hand. I’ve done multithreaded programming since OS/2 in the early 90s, back when Unix peeps were religiously against threads, and I’ve seen all kinds of nightmarishly difficult to debug problems that boiled down to a tiny bit of non-threadsafe code.

I would just like to have a better understanding of how code running in one context could possibly affect code running in other contexts, but we have very little public information about the architecture to go on (as far as I know, anyway).

watou

Sure, but for reference, we’ve seen the same types of [Thread] problems that UI4 had in recent UI5 deployments as well… they’re just a lot less frequent.

I don’t recall which build, but Ap ping’ed me in the background a few months ago with the exact symptom that we saw in UI4 (basically memory was getting trashed, and Lua Ptrs were suddenly pointing to fns instead of ints)

So yes, it’s always possible, but in this game, where there are a lot of black boxes present, you have to shoot for the most likely stuff to put attention to when debugging… And of course, keep an eye on the other options, just in case…

The other “possible, but not as likely” case revolves around the custom Network driver that MCV commissioned for OpenWRT and their Vera3 Sercomm hardware. Possible, but always less likely than other concerns…

PS: Seems you and I have similar backgrounds :wink:

back when Unix peeps were religiously against threads
They're heading back that way, at least for Scala folks :D
I've done multithreaded programming since OS/2 in the early 90s, back when Unix peeps were religiously against threads, and I've seen all kinds of nightmarishly difficult to debug problems that boiled down to a tiny bit of non-threadsafe code.

I was a unix peeps … and we did our own same process threading library before there was a threads package. We did not support pre-emption … I believe that the LUA implementation in Vera is the same … unless you verify all the libraries are thread safe … you can’t support pre-emption. With the newer threads package they probably have a global LUA thread lock and only allow one LUA thread to run at a time (per LUA_STATE). They have to trap the IO requests and release the LOCK before doing the IO call, and re acquire it before returning in LUA. Prior to thread aware IO libraries … we converted synchronous IO requests to asynchronous calls and yielded control to the scheduler. And when the IO had activity (The Unix Select Signal) we woke the thread that would have blocked on the IO request.

If they do not handle their locking properly … they can possibly have two threads running LUA code … and I do not think the LUA kernel (LUA_STATE) is thread safe. It’s not just globals or static data … you also have to protect critical sections of code in the cases where you access multiple pieces of information and they must be coherent through the duration of the critical section.