-
June 7th, 2012, 04:03 AM
#1
Windows Unicode
As I understand it, Windows Unicode uses 2 bytes per character. In many cases, the upper byte is zero. This seems to be at least true for all characters up to lower case 'z' (0x7A). So the non-unicode string HELLO would be:-
Code:
0x48 0x45 0x4C 0x4C 0x4F
H E L L O
whereas its unicode equivalent would be:-
Code:
0x48 0x00 0x45 0x00 0x4C 0x00 0x4C 0x00 0x4F 0x00
H E L L O
At what point do we start getting a non-zero value in that upper byte? Is it after (say) character 0x7F? Or do all the first 256 characters give a zero value for the 2nd byte?
Basically, I'm trying to figure out if there's a way to convert (very simple) Windows unicode strings to single-byte strings on a non-Windows platform (i.e. where we don't have wcstombs() available). If I knew that each value was always going to be less than 256, could I do a cheap-and-dirty conversion by just taking every alternate byte?
"A problem well stated is a problem half solved.” - Charles F. Kettering
-
June 7th, 2012, 04:35 AM
#2
Re: Windows Unicode
Victor Nijegorodov
-
June 7th, 2012, 04:36 AM
#3
Re: Windows Unicode
Originally Posted by John E
Basically, I'm trying to figure out if there's a way to convert (very simple) Windows unicode strings to single-byte strings on a non-Windows platform (i.e. where we don't have wcstombs() available). If I knew that each value was always going to be less than 256, could I do a cheap-and-dirty conversion by just taking every alternate byte?
Yes, if you can guarantee that the Unicode character set being used is Basic Latin and Latin Supplement, described here:
http://en.wikipedia.org/wiki/List_of_Unicode_characters
Otherwise, no.
Regards,
Paul McKenzie
-
June 7th, 2012, 06:51 AM
#4
Re: Windows Unicode
Thanks guys. It's possible that I might not need it now but that's useful information to have.
Basically, my (cross-platform) app can launch a Windows child process. On Linux and OS-X this gets done via a utility called Wine. I might need to pass the child process a file path which will be in UTF-8 format in the main app. I wasn't sure whether Wine would convert the UTF-8 string to a format that Windows can understand - or whether I'd need to do that myself - but I've just been advised that Wine already takes care of this, so fingers crossed.
"A problem well stated is a problem half solved.” - Charles F. Kettering
-
June 7th, 2012, 08:17 AM
#5
Re: Windows Unicode
If needed, you should be able to use <codecvt> ...
http://msdn.microsoft.com/en-us/libr...v=vs.100).aspx
Josuttis gives a few sample usages:
http://www.cppstdlib.com/
Click on "examples" , then "table of contents of all examples"[/url]
Scroll down to Section 16 ... Internationalization (last 5 examples in
the group).
-
June 7th, 2012, 08:36 AM
#6
Re: Windows Unicode
Very useful links, Philip. Thanks!
"A problem well stated is a problem half solved.” - Charles F. Kettering
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|