For some strange reason I always assumed that mbstowcs (and its associated functions) were Microsoft specific but when I googled just now, it looked like those functions are available for other platforms too.
So what kind of wide characters would we get on Linux or OS-X? Would it give us UTF-8 on those platforms or would we still get Windows style (2-byte) characters?
"A problem well stated is a problem half solved.” - Charles F. Kettering
The size and encoding of a wchar_t is implementation defined. On Windows OS's, a wchar_t is always 2 bytes and uses UTF16-LE. On many *nix flavors, wchar_t is 4 bytes and uses UTF32 with native endianess.
The narrow encoding that mbstowcs assumes comes form the LC_CTYPE of the current locale.
Earlier today I was looking into LC_CTYPE but although I could understand how it might affect character conversions (toupper() / tolower() etc) I couldn't quite understand what affect it has on character representations.
For example on Windows, 'non-wide' characters are usually represented as single bytes. The actual character printed depends on the user's code page. On Linux, 'non-wide' characters are usually UTF-8. So what does mbstowcs() do on a Linux system? Does it convert variable width UTF-8 characters into fixed with (32-bit) wide characters?
Interesting stuf...!
"A problem well stated is a problem half solved.” - Charles F. Kettering
I'd say "yes" too, but believe that the most effective way to find that is to build a demo app an inspect the memory. Much more reliable compared to asking about Linux on Windows forum.
Bookmarks