Apostrophe changed to strange characters in user_data view

parkerc · September 4, 2018, 7:50pm

Hi,

I was wondering why I could not find a couple of device in the user_data (192.168.1.111:3480/data_request?id=user_data) lookup, and I discovered that for some reason the apostrophe has be changed to a euro and a trade mark symbol ?

Bob??s Room Temperature
Jill’s Room Temperature

In Vera UI the apostrophe is clearly there, but not in the user_data extract.

Just curious if anyone else has seen this ?

rigpapa · September 4, 2018, 9:58pm

Your apostrophe must be one of the “fancy” versions, not the plain vanilla ASCII 0x27, so the UTF-8 encoding is kicking in and converting it to a three-byte sequence for storage. When it’s pulled out for display, that encoding is undone, which it why it appears normal on screen.

parkerc · September 5, 2018, 3:00am

Interesting . thanks @rogapa - seeing as it is the only apostrophe I have . Do you have any idea how I change it?

It?s unlikely coincidental that you raise this, as I reported an issue on the forum admin a while back that when I use my iPad keyboard, my apostrophes are convented to question marks !! - http://forum.micasaverde.com/index.php/topic,51369.msg331299.html#msg331299

rigpapa · September 5, 2018, 10:51am

Interesting . thanks @rogapa - seeing as it is the only apostrophe I have . Do you have any idea how I change it?

It?s unlikely coincidental that you raise this, as I reported an issue on the forum admin a while back that when I use my iPad keyboard, my apostrophes are convented to question marks !! - http://forum.micasaverde.com/index.php/topic,51369.msg331299.html#msg331299[/quote]

Ah. I’m not a Mac/iPad guy, but apparently if you long-press on the apostrophe key on your iPad keyboard, you get other character choices, one of which is the “straight” apostrophe that is ASCII 0x27. It is also rumored that you can turn off “Smart Quotes” or “Smart Punctuation” to permanently stop the iPad from inserting the “fancy” quotes and just insert the basic ASCII character.

parkerc · September 6, 2018, 3:10am

Who would have thought there would be so many apostrophe choices

Ok - lets try it.

On my iPhone now - here?s the current default apostrophe choice ?. next after long press here?s the first option = ? , second option = ?, third option = ? and finally the fourth = `

Now to publish to see the results

parkerc · September 6, 2018, 3:12am

That?s funny - out of 4 or 5 apostrophe choices on the iOS keyboard only one will appear as an apostrophe on this forum site , the rest are converted to question marks

rigpapa · September 6, 2018, 11:57am

Yes, only 0x27 is in the lower range of ASCII characters in the browsers code set. Everything in higher range has to be converted. To over-simplify a bit, each language has its own set of characters. The positions of the “standard” characters in the lower 128 ASCII range (95 printable characters that include digits 0-9, upper- and lower-case A-Z, the basic punctuation marks, etc.) is fixed, but characters with diacritical marks (accents, umlauts, etc.), and “fancy” stuff like curly apostrophes and quotes that are more typographically appealing, end up in the high range, and that high range can map differently for other languages.

Since each language’s character set can be different (the character at ordinal position 196 in one set is different from that in the same position in another), correctly displaying a string written in one character set could render the characters incorrectly in another, so in the attempt to resolve this, UTF encoding converts those characters to a standard mapping from a large, unified set of characters for all languages. Then, when a UTF-encoded string needs to be displayed in a different language, those characters are decoded and mapped back to where they are in that language’s character set.

All of this is primarily a function of support for non-Roman languages, like Greek, Cyrillic, Arabic, Thai, Japanese, Chinese, etc. I remember when Windows made this shift. It was a lot to wrap your head around, and many developers felt that we had done fine with basic 95 characters in 7-bit ASCII (the lower 128) up to that point, why change? Oh, maybe because most of the population of the planet uses non-Roman character sets. But simple things got harder, for sure. It was no longer possible to directly scan bytes of a stored string to find a substring, for example, you had to take extra steps, notably matching your encodings, and now searches for one character have to be multi-byte. It also introduces the possibility that characters may look very much the same, and be used interchangeably, like the lower ASCII 0x27 apostrophe and your curly apostrophe, but searching for one doesn’t match the other, in spite of their visual similarity and use. More work to fix that.

As for this forum (SMF), it apparently doesn’t support UTF encoding, although the browser will happily jam it down SMF’s throat by default.

parkerc · September 6, 2018, 6:43pm

Thanks so much for the explanation @rigpapa