how to get length of UTF8 encoded string

**TheCPUWizard** · December 31st, 2008, 11:17 PM

Originally Posted by Codeplug

>> [b]Most of post #11 was about your claim of wchar_t being "portable". There are NO portable guarantees for wchar_t as it relates to character sets and encodings - except that a wchar_t can represent any char. As an integer type, it is as-portable-as int. The sizeof both are implementation defined.

>> 1) ... roundtrippable
And wchar_t does not provide for this "across all system boundaries", as stated earlier, since system A may use one encoding/character set while system B uses something entirely different to represent wchar_t's. The only Unicode encoding that does provide for this is UTF8 using char's since endianess comes into play for the other UTF's.

I think we are saying the same thing from two different points of view...

As soon as an application starts to look at the content, things change just like Schrödinger's cat, or Heisenberg's uncertainty principle. As soon as you start talking about the meaning of the encoded information every thing does become implementation dependant.

Consider the following sequence.

a) A files exists with a character encoding of "X"
b) This file is read and processed by an application which supports encoding 'x'.
c) A new file is written with encoding 'X'
d) A different application on a different platform with a different sizeof(wchar_t), that ALSO supports encoding "X" reads and processes the file.

the internal byte representations on the two applications may be totally different. but the usage of wchar_t as the internall processing mechanism will not destroy the portability of the information.

Because of this it is critical to make use the the proper encoding classes when manipulating the data, and not every application or platform will support every encoding.

But the act of using wchar_t per se, does NOT mean that the application is non-portable. What you DO while the information that is stored in the wchar_t based variables is a completely different story.

Thread: how to get length of UTF8 encoded string

Thread Tools

Display

Threaded View

Re: how to get length of UTF8 encoded string

Posting Permissions