-
November 17th, 2012, 10:25 AM
#1
mbstowcs
For some strange reason I always assumed that mbstowcs (and its associated functions) were Microsoft specific but when I googled just now, it looked like those functions are available for other platforms too.
So what kind of wide characters would we get on Linux or OS-X? Would it give us UTF-8 on those platforms or would we still get Windows style (2-byte) characters?
"A problem well stated is a problem half solved.” - Charles F. Kettering
-
November 17th, 2012, 01:07 PM
#2
Re: mbstowcs
The size and encoding of a wchar_t is implementation defined. On Windows OS's, a wchar_t is always 2 bytes and uses UTF16-LE. On many *nix flavors, wchar_t is 4 bytes and uses UTF32 with native endianess.
The narrow encoding that mbstowcs assumes comes form the LC_CTYPE of the current locale.
gg
-
November 17th, 2012, 01:53 PM
#3
Re: mbstowcs
Earlier today I was looking into LC_CTYPE but although I could understand how it might affect character conversions (toupper() / tolower() etc) I couldn't quite understand what affect it has on character representations.
For example on Windows, 'non-wide' characters are usually represented as single bytes. The actual character printed depends on the user's code page. On Linux, 'non-wide' characters are usually UTF-8. So what does mbstowcs() do on a Linux system? Does it convert variable width UTF-8 characters into fixed with (32-bit) wide characters?
Interesting stuf...!
"A problem well stated is a problem half solved.” - Charles F. Kettering
-
November 18th, 2012, 04:59 AM
#4
Re: mbstowcs
Originally Posted by John E
For example on Windows, 'non-wide' characters are usually represented as single bytes
Usually yes. But the same way in Windows non-wide character is called MBC which is Multi Byte Character.
Best regards,
Igor
-
November 18th, 2012, 05:23 AM
#5
Re: mbstowcs
Thanks Igor. So does that effectively answer my question...
Originally Posted by John E
what does mbstowcs() do on a Linux system? Does it convert variable width UTF-8 characters into fixed with (32-bit) wide characters?
Given that UTF-8 is the most common (multi byte) character representation now on Linux, I guess the answer is "yes" ?
"A problem well stated is a problem half solved.” - Charles F. Kettering
-
November 18th, 2012, 12:58 PM
#6
Re: mbstowcs
I'd say "yes" too, but believe that the most effective way to find that is to build a demo app an inspect the memory. Much more reliable compared to asking about Linux on Windows forum.
Best regards,
Igor
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|