What would be the most efficient pattern to match a XML tag with possible attributes ?
I just discovered that using pattern “<tag%s?.->” is working but is terribly slow
Any alternative that would be more efficient ?
This is probably not quite the answer you are looking for and I suspect you have probably checked this out but have you had a look at Futzle’s xpath code in the WeMo plugin. It uses the LuaXpat library in Vera:
http://matthewwild.co.uk/projects/luaexpat/index.html
Here is a cut down version of Futzle’s code that shows it at work:
http://forum.micasaverde.com/index.php/topic,15566.msg119956.html#msg119956
I’m not sure if Futzle’s xpath handles tag attributes, however “lxp.lom.parse” does attributes:
http://matthewwild.co.uk/projects/luaexpat/lom.html
but was horrible to use and I don’t recommend it. Anyway I may be a little off track with this post.
Regular expressions are awful for parsing XML. You get edge cases like this where an attribute contains a greater-than character:
<Element Attribute="x>y">
That’s perfectly valid XML; you don’t have to escape > to > like you do with less-than. If you want to catch that with a regular expression then you find yourself counting even/odd quotation marks. I don’t think Lua patterns are powerful enough to do it all in one step.
Which is why I went for the Lua XML parser (lxp) in the WeMo plugin. It’s an event-based (SAX) parser that invokes callback functions when it sees an Element or Text or Comment or whatever the XML contains. Better, it takes care of matching opening and closing tags, of XML escaping, and of XML namespaces. It’s relatively easy to write callback functions that will grab any part of the XML you care about. I based mine on the XPath syntax, since I’m an XSLT tragic. But you can hand-roll very specific code if you care only about one thing. Also, it’s fast, because all of the actual parsing code is C.
[quote=“lolodomo, post:1, topic:177759”]What would be the most efficient pattern to match a XML tag with possible attributes ?
I just discovered that using pattern “<tag%s?.->” is working but is terribly slow
Any alternative that would be more efficient ?[/quote]
Using “<tag%s?[^>]->” is faster.
I do XML parsing with lua parsing and it seems to work in my specific context.
[quote=“futzle, post:3, topic:177759”]Regular expressions are awful for parsing XML. You get edge cases like this where an attribute contains a greater-than character:
<Element Attribute="x>y">
That’s perfectly valid XML; you don’t have to escape > to > like you do with less-than. If you want to catch that with a regular expression then you find yourself counting even/odd quotation marks. I don’t think Lua patterns are powerful enough to do it all in one step.[/quote]
That’s clear that my code will not handle this use case.
At the same time, I am not sure that we can encounter this use case in the UPnP context and SOAP messages. Everything relative to metada (that could contain a >) is normally escaped.
But your remark let me think that I should check that I am always escaping data when using XML.
For example, one test I should try is to name one of my Sonos zones like this (if accepted by the Sonos control application) and see what happens : <Bedroom '&1">