CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 6 of 6

Thread: Windows Unicode

  1. #1
    John E is offline Elite Member Power Poster
    Join Date
    Apr 2001
    Location
    Manchester, England
    Posts
    4,835

    Windows Unicode

    As I understand it, Windows Unicode uses 2 bytes per character. In many cases, the upper byte is zero. This seems to be at least true for all characters up to lower case 'z' (0x7A). So the non-unicode string HELLO would be:-

    Code:
    0x48 0x45 0x4C 0x4C 0x4F
       H    E    L    L    O
    whereas its unicode equivalent would be:-

    Code:
    0x48 0x00 0x45 0x00 0x4C 0x00 0x4C 0x00 0x4F 0x00
       H         E         L         L         O
    At what point do we start getting a non-zero value in that upper byte? Is it after (say) character 0x7F? Or do all the first 256 characters give a zero value for the 2nd byte?

    Basically, I'm trying to figure out if there's a way to convert (very simple) Windows unicode strings to single-byte strings on a non-Windows platform (i.e. where we don't have wcstombs() available). If I knew that each value was always going to be less than 256, could I do a cheap-and-dirty conversion by just taking every alternate byte?
    "A problem well stated is a problem half solved.” - Charles F. Kettering

  2. #2
    VictorN's Avatar
    VictorN is offline Super Moderator Power Poster
    Join Date
    Jan 2003
    Location
    Hanover Germany
    Posts
    20,396

    Re: Windows Unicode

    Victor Nijegorodov

  3. #3
    Join Date
    Apr 1999
    Posts
    27,449

    Re: Windows Unicode

    Quote Originally Posted by John E View Post
    Basically, I'm trying to figure out if there's a way to convert (very simple) Windows unicode strings to single-byte strings on a non-Windows platform (i.e. where we don't have wcstombs() available). If I knew that each value was always going to be less than 256, could I do a cheap-and-dirty conversion by just taking every alternate byte?
    Yes, if you can guarantee that the Unicode character set being used is Basic Latin and Latin Supplement, described here:

    http://en.wikipedia.org/wiki/List_of_Unicode_characters

    Otherwise, no.

    Regards,

    Paul McKenzie

  4. #4
    John E is offline Elite Member Power Poster
    Join Date
    Apr 2001
    Location
    Manchester, England
    Posts
    4,835

    Re: Windows Unicode

    Thanks guys. It's possible that I might not need it now but that's useful information to have.

    Basically, my (cross-platform) app can launch a Windows child process. On Linux and OS-X this gets done via a utility called Wine. I might need to pass the child process a file path which will be in UTF-8 format in the main app. I wasn't sure whether Wine would convert the UTF-8 string to a format that Windows can understand - or whether I'd need to do that myself - but I've just been advised that Wine already takes care of this, so fingers crossed.
    "A problem well stated is a problem half solved.” - Charles F. Kettering

  5. #5
    Join Date
    Aug 2000
    Location
    West Virginia
    Posts
    7,721

    Re: Windows Unicode

    If needed, you should be able to use <codecvt> ...

    http://msdn.microsoft.com/en-us/libr...v=vs.100).aspx


    Josuttis gives a few sample usages:

    http://www.cppstdlib.com/

    Click on "examples" , then "table of contents of all examples"[/url]

    Scroll down to Section 16 ... Internationalization (last 5 examples in
    the group).

  6. #6
    John E is offline Elite Member Power Poster
    Join Date
    Apr 2001
    Location
    Manchester, England
    Posts
    4,835

    Re: Windows Unicode

    Very useful links, Philip. Thanks!
    "A problem well stated is a problem half solved.” - Charles F. Kettering

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured