• June 23rd, 2013, 07:27 PM
Mike Pliam
How to set low and high bytes of a wchar_t ?
I have a method of converting a wide char array to an unsigned char array. Now I need to be able to convert back (from unsigned char array to wide char array) but cannot figure out how to do that.

Here's my wc2uc method which uses built in macros (BTW, where do these come from? Are they unique to Windows API or are they part of C standard ?):
Code:

```int wcstoucs(wchar_t wcs[], int nsz, unsigned char uc[] ) {         //uc = new unsigned char [ 2 * nsz + 1 ];         //memset(uc, 0x00, 2 * nsz + 1);         wchar_t wch = ' ';         int wdx = 0;         for(size_t i = 0; i < 2 * nsz; i+=2)         {                 wch = wcs[wdx];                 //pb[i] = LOBYTE(wch);      // bigEndian                 //pb[i+1] = HIBYTE(wch);                 uc[i] = HIBYTE(wch);                // littleEndian (x86)                 uc[i+1] = LOBYTE(wch);                 wdx++;         }         return 2 * nsz; }// wcstoucs(wchar_t wcs[], int nsz, unsigned char uc[] )```
And here's a method that only depends upon what I am certain is native 'C':
Code:

```        // given a wchar_t get the low and the hi order bytes         wchar_t wch;         byte lobyte, hibyte;         wch = 0xABCD;         lobyte  = wch &0xff;         hibyte = wch >> 8;         printf("wch = %0.4X\n", wch);    // ABCD         printf("lobyte =: %0.2X\n", lobyte);  // CD         printf("hibyte =: %0.2X\n", hibyte);  // AB```
But we run into an lvalue problem if we try to inverse the operation:
Code:

```        wchar_t wch = 0x0000;         unsigned char ucb = 0x94;         // set the low byte         LOBYTE(wch) = ucb;  // Error: Expression must be a modifiable lvalue```
So how to set the hi and lo bytes of a wchar_t ?
• June 24th, 2013, 06:08 AM
2kaud
Re: How to set low and high bytes of a wchar_t ?
Code:

```#include <wchar.h> #include <stdio.h> int main() { wchar_t        wct; unsigned char lb, ub;         lb = 0x17;         ub = 0x15;         wct = (ub << 8) + lb;         printf("0x%04x", wct);         return 0; }```
This prints 0x1517. I hope this is what you wanted.
• June 24th, 2013, 12:12 PM
Mike Pliam
Re: How to set low and high bytes of a wchar_t ?
Quote:

This prints 0x1517. I hope this is what you wanted.
Exactly! Thanks very much.
• June 24th, 2013, 02:47 PM
Re: How to set low and high bytes of a wchar_t ?
Quote:

Originally Posted by Mike Pliam
Exactly! Thanks very much.

Are you trying to re-invent MultiByteToWideChar function?
• June 24th, 2013, 05:20 PM
Codeplug
Re: How to set low and high bytes of a wchar_t ?
>> And here's a method that only depends upon what I am certain is native 'C'
The size of wchar_t is implementation defined. On most *nix's it is 4 bytes as UTF32 in native byte-order. On all Windows platforms it's 2 bytes as UTF16LE.

>> wcstoucs
That's name confused me because UCS is an encoding :)

• June 25th, 2013, 06:53 AM
OReubens
Re: How to set low and high bytes of a wchar_t ?
Quote:

Are you trying to re-invent MultiByteToWideChar function?

doesn't loook like it, since he's not converting anything codepage wise...

it looks more like he's trying to reinvent a typecast from a wchar_t* to char*, but doing it by copying rather than casting the buffer.
• June 25th, 2013, 07:00 AM
OReubens
Re: How to set low and high bytes of a wchar_t ?
Quote:

Originally Posted by Codeplug
On all Windows platforms it's 2 bytes as UTF16LE.

This isn't correct.
On Windows NT it's UCS2
On Windows XP it's technically UTF16 but none of the fonts support a codepoint above 0xFFFF

Win95/98/ME has only partial support for wide character API's which was then also UCS2.

When dealing with networks/file systems UTF16 (even UCS2) has been flaky in the past, it wasn't until recently that a lot of the issues got cleaned up.
• June 25th, 2013, 09:05 AM
Codeplug
Re: How to set low and high bytes of a wchar_t ?
>> This isn't correct.
I'm sure readers can google if they are interested in the level-of-support for surrogate pairs in each historical platform - and even then support varies among the modules/apps in each platform:
http://www.i18nguy.com/surrogates.html
http://msdn.microsoft.com/en-us/goglobal/bb688099.aspx
http://msdn.microsoft.com/en-us/libr...(v=vs.85).aspx

In the end, it's just easier to say "Windows is UTF16LE".

