1 (edited by spunk_ 2010-12-11 09:41:37)

Topic: umlaut like "ä, ö, ü" an other special characters.

hello

i remember a long time ago someone noticed that it could be easy to change the control or the firmware to show umlaut and other special characters .

without this option  most, or better all rss-feeds or mails or even the meta-data in radio-streams ar impossible to read.


is this change done before christmas (this year) or do all the people have to wait till next christmas?

Re: umlaut like "ä, ö, ü" an other special characters.

This is handled on a widget-by-widget basis, since the fonts are almost always included in the widgets.

The other issue is the encoding.  Several widgets include extended characters, however, FlashLite is limited to UTF-8 character encoding, and many feeds use other encodings, which means the characters get garbled long before they are rendered.

In the worst cases, some XML feeds include multiple encodings in the same document, which is invalid.  This is done by sloppy server developers that simply concatenate files without regard to encoding.  Many browsers and native RSS readers have extra code that attempt to mitigate this, but FlashLite requires well-formed documents.

The Control Panel and most chumby-published widgets included the extended characters, however, neither can do anything about incompatible or invalid character encoding.

Re: umlaut like "ä, ö, ü" an other special characters.

Duane wrote:

This is handled on a widget-per-widget basis


what is to change in following widgets to show all umlauts?

for example:
- youstreams  (Umalut is not shown in podcast, but it is possible to read the rest)
- generic rss-reader (umlaut i not show and the rest of the text is almost not readable)
- email viewer (like generic rss-reader)


is this an error in the widgets or is it an easy way for the xreator to change this feature?
then i would contact the creator to please improve.

Re: umlaut like "ä, ö, ü" an other special characters.

I don't know without looking at the data feeds being used.

When you create a TextFiled in Flash, you specifiy whether or not you want the font to be embedded in the movie (almost always the case), and then how much of the font you want to embed.  I think in most versions of Flash, the default set of characters is 7-bit ASCII (aka "Basic Latin"), which consists of 95 glyphs, so you have to explicitly specify what you want if you want more than that.

It's a normal part of Chumby's widget development process to add the "Latin-1" character set (which has 388 glyphs) - however, we do occasionally miss one.

The fact that the characters *after* the extended ones are garbled would seem to indicate an encoding problem.  Usually if the font is missing a character, it's simply skipped.

I think it's quite reasonable to add a comment to the widget that alerts the developer to the issue.  In the case of widgets that pull arbitrary feeds (such as RSS readers), it would be *very* helpful to provide a link to the actual troublesome feed so that it can be examined for possible encoding issues.

For those authors confronting this issue, it might be worth checking to see if Yahoo Streams or Google App Engine can be used to re-encode a feed from an arbitrary or mixed encoding to UTF-8.

Re: umlaut like "ä, ö, ü" an other special characters.

Duane wrote:

I think it's quite reasonable to add a comment to the widget that alerts the developer to the issue

okay -  done for all three applications.

Duane wrote:

.  In the case of widgets that pull arbitrary feeds (such as RSS readers), it would be *very* helpful to provide a link to the actual troublesome feed so that it can be examined for possible encoding issues.

also done -  my example for rss is
http://www.tagesschau.de/xml/rss2

Re: umlaut like "ä, ö, ü" an other special characters.

Thanks!

Re: umlaut like "ä, ö, ü" an other special characters.

So, that feed you linked is an example of the encoding problem.  If you fetch the feed, the header says:

<?xml version="1.0" encoding="ISO-8859-1" ?>

...which means it's *not* encoded in UTF-8, and therefore not directly consumable by Flash. I should also not that this is not a problem specific to the chumby - all versions of Flash have this limitation.

In Actionscript 3, once could read this into a ByteArray, do the necessary character conversion, *then* pass it on to the XML parser, but for the Actionscript 2 used by FlashLite 3, this isn't possible.  This is a candidate for trying to send through some external service.

The sad thing about this that it's possible to encode extended characters in a way that it's consumable by *all* XML parsers, even ones that are limited to a certain encoding, which is to encode these characters as "character entities".  It seems that most server-side developers just take the easy way out.

Re: umlaut like "ä, ö, ü" an other special characters.

Duane wrote:
<?xml version="1.0" encoding="ISO-8859-1" ?>

...which means it's *not* encoded in UTF-8, and therefore not directly consumable by Flash. I should also not that this is not a problem specific to the chumby - all versions of Flash have this limitation.

but where is the problem to always assume that the header is wrong even there are umalut in the following text?
i think all german feeds use umlaut.

just integrate "always show umlaut even if the header tells there is no one".


Duane wrote:

The sad thing about this that it's possible to encode extended characters in a way that it's consumable by *all* XML parsers, even ones that are limited to a certain encoding, which is to encode these characters as "character entities".  It seems that most server-side developers just take the easy way out.

perhaps -  i do not see any problem with umlaut in this feed with my telephone (yes this telephon can display feeds in the display) or of course with the web-browser.



the biggest problem is of course the email-viewer because all mails use umlaut. there is no one without. so that is a must have in this apllication to ignore the header and display all characters used.

Re: umlaut like "ä, ö, ü" an other special characters.

Well, that header tells the XML reader how the characters are encoded - I took a look at the body of the feed, and it is indeed encoded in ISO-8859-1, not UTF-8.  The header is correct, and there are indeed extended characters in that feed encoded that way.

You can't just say "understand umlauts", because the way they're represented in the data is completely different for each encoding method.  Fundamentally, these feeds are a sequence of bytes - that the individual sequence for an umlauted character is a single byte for ISO-8859-1, and a sequence of two bytes for UTF-8, and when the reader encounters a byte outside of the normal 7-bit ASCII range, it needs to know the encoding method in order to properly interpret the data.

You phone probably has a better XML parser than FlashLite, though I've also seen devices understand ISO-8859-1 (which is commonly created by Windows machines) and *not* UTF-8 (which is *required* by the XML specification).

ISO-8859-1 is actually a quite poor encoding method - unlike UTF-8, it can only handle western European languages.  It can't handle, say Turkish, some of the Baltic languages, Greek, etc.

As I said, as long as the application is in FlashLite 3, then it simply won't be able to *directly* read this particular feed.  I agree it's an issue that should be resolved, but it currently can't be done on the client alone.

Re: umlaut like "ä, ö, ü" an other special characters.

Duane wrote:

As I said, as long as the application is in FlashLite 3, then it simply won't be able to *directly* read this particular feed.  I agree it's an issue that should be resolved, but it currently can't be done on the client alone.

do you mean it could be part of the chumby itself -  maybe the controll oder the firmware or something else - without changes of the programmers of the applications?

just a simple switch at the install-procedure where the owner can decide if a wrong header should be ignored and the characters inside should be interpreted as umlaut or anything else.

Re: umlaut like "ä, ö, ü" an other special characters.

You can't ignore the header - it tells you what the encoding is.  The headers are correct - the problem is that FlashLite does not support any encoding besides UTF-8.  It *is* ignoring the headers, and is therefore confused about what the data actually means.

Think of it this way - pretend that every time you speak a sentence, you must say what language it's in first, something like "English: my cat is on fire", or "French: mon chat est sur le feu" (please pardon any goofy translations).

Now consider what happens if someone simply says "mijn kat is op brand".  The problem here is that we have no idea what language this is to make sense of the sentence - we require that the person say "Dutch:mijn kat is op brand".  That's the way XML works, though it has a default - if you don't specify otherwise, the encoding if UTF-8, which would be like saying something without specifying the language would always be English.

The problem here is that Flash *only* speaks "English" - you can speak French or Dutch to it all day long and it simply won't understand, *even* if you explicitly say what language it's in.  It will always try to interpret what you say as English, and since it's not, it garbles what you say. In the case of the Dutch, a few of the words are the same or very similar in English ("is"), and some appear in English, but have different meaning ("brand").

To solve this requires that the data be "translated" by some external service - in our analogy, the "Dutch" data feed would go through a "translation" service and converted to "English".  In the case of the simple POP3 client on the chumby, that could be done locally by modifying the code.  For random feeds meant to be consumed by the RSS widget, it would have to be something not on the device, and the widgets would have to be modified to use whatever translation service that might be.  As I mentioned, it's quite possible that Yahoo Pipes or Google App Engine could do this, depending upon how good their XML parsers are.

Re: umlaut like "ä, ö, ü" an other special characters.

Duane wrote:

To solve this


thanks -  you did it.

it works for the feed linked obove now: all umlauts are displayed.

Re: umlaut like "ä, ö, ü" an other special characters.

ARD has changed the feed to UTF-8, therefore it's working now.