CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 6 of 6

Thread: mbstowcs

  1. #1
    John E is offline Elite Member Power Poster
    Join Date
    Apr 2001
    Location
    Manchester, England
    Posts
    4,835

    mbstowcs

    For some strange reason I always assumed that mbstowcs (and its associated functions) were Microsoft specific but when I googled just now, it looked like those functions are available for other platforms too.

    So what kind of wide characters would we get on Linux or OS-X? Would it give us UTF-8 on those platforms or would we still get Windows style (2-byte) characters?
    "A problem well stated is a problem half solved.” - Charles F. Kettering

  2. #2
    Join Date
    Nov 2003
    Posts
    1,902

    Re: mbstowcs

    The size and encoding of a wchar_t is implementation defined. On Windows OS's, a wchar_t is always 2 bytes and uses UTF16-LE. On many *nix flavors, wchar_t is 4 bytes and uses UTF32 with native endianess.

    The narrow encoding that mbstowcs assumes comes form the LC_CTYPE of the current locale.

    gg

  3. #3
    John E is offline Elite Member Power Poster
    Join Date
    Apr 2001
    Location
    Manchester, England
    Posts
    4,835

    Re: mbstowcs

    Earlier today I was looking into LC_CTYPE but although I could understand how it might affect character conversions (toupper() / tolower() etc) I couldn't quite understand what affect it has on character representations.

    For example on Windows, 'non-wide' characters are usually represented as single bytes. The actual character printed depends on the user's code page. On Linux, 'non-wide' characters are usually UTF-8. So what does mbstowcs() do on a Linux system? Does it convert variable width UTF-8 characters into fixed with (32-bit) wide characters?

    Interesting stuf...!
    "A problem well stated is a problem half solved.” - Charles F. Kettering

  4. #4
    Join Date
    Nov 2000
    Location
    Voronezh, Russia
    Posts
    6,620

    Re: mbstowcs

    Quote Originally Posted by John E View Post
    For example on Windows, 'non-wide' characters are usually represented as single bytes
    Usually yes. But the same way in Windows non-wide character is called MBC which is Multi Byte Character.
    Best regards,
    Igor

  5. #5
    John E is offline Elite Member Power Poster
    Join Date
    Apr 2001
    Location
    Manchester, England
    Posts
    4,835

    Re: mbstowcs

    Thanks Igor. So does that effectively answer my question...

    Quote Originally Posted by John E View Post
    what does mbstowcs() do on a Linux system? Does it convert variable width UTF-8 characters into fixed with (32-bit) wide characters?
    Given that UTF-8 is the most common (multi byte) character representation now on Linux, I guess the answer is "yes" ?
    "A problem well stated is a problem half solved.” - Charles F. Kettering

  6. #6
    Join Date
    Nov 2000
    Location
    Voronezh, Russia
    Posts
    6,620

    Re: mbstowcs

    I'd say "yes" too, but believe that the most effective way to find that is to build a demo app an inspect the memory. Much more reliable compared to asking about Linux on Windows forum.
    Best regards,
    Igor

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured