CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 4 of 4
  1. #1
    Join Date
    May 2002
    Posts
    1,798

    How x86 processor and VS 2010 handles wide char arrays endianness

    Once again I'm confused. Take a look at this code.

    Code:
    /// display the wchar_t array
    void printwbytes(wchar_t * pwc, int nsz)
    {
    	printf("\n");
    	for(size_t i = 0; i < nsz; i++) 
    	{ 
    		printf("0x%0.4X ", pwc[i]); 
    		//printf("%0.2X ", pwc[i]);  // just prints the low order byte
    		if((i+1) % 8 == 0) { printf("\n"); }
    	}
    	printf("\n");
    
    }// printwbytes(wchar_t * pwc, int nsz)
    
    
    /// change from bigendian to littleendian or vice versa
    int reverseEndian(wchar_t * wcp, int nsz)
    {
    	wchar_t wch1 = ' ';
    	wchar_t wch2 = ' ';
    	wchar_t * wcp2 = new wchar_t[nsz];
    	unsigned char lb, hb;
    	for(size_t i = 0; i <  nsz; i++)
    	{
    		wch1 = wcp[i];
    		/*
    		lb = LOBYTE(wch1);     
    		hb = HIBYTE(wch1);
    		*/
    		lb  = wch1 &0xff;
    		hb = wch1 >> 8;
    
    		wch2 = (lb << 8) + hb;
    		wcp[i] = wch2;
    	}
    	return nsz;
    
    }// reverseEndian(wchar_t * wcp, int nsz)
    
    
    
    int _tmain(int argc, _TCHAR* argv[])
    {
    
    	wchar_t wcin[256], wcout[256];
    	wcscpy_s(wcin, 256, _T("Nitche was an idiot"));
    	//wcout << wcin << endl;  // not allowed ??
    	wprintf(_T("%s\n"), wcin);
    	printwbytes(wcin, wcslen(wcin));
    	reverseEndian(wcin, wcslen(wcin));
    	printwbytes(wcin, wcslen(wcin));
    	wprintf(_T("%s\n"), wcin);
    Output:
    Nitche was an idiot

    0x004E 0x0069 0x0074 0x0063 0x0068 0x0065 0x0020 0x0077
    0x0061 0x0073 0x0020 0x0061 0x006E 0x0020 0x0069 0x0064
    0x0069 0x006F 0x0074

    0x4E00 0x6900 0x7400 0x6300 0x6800 0x6500 0x2000 0x7700
    0x6100 0x7300 0x2000 0x6100 0x6E00 0x2000 0x6900 0x6400
    0x6900 0x6F00 0x7400
    ???????????????????
    What I find confusing is the use of 0x[hi byte][lobyte], e.g., 0x004E suggesting big-endianness when one considers the definitions:
    UTF-16 (BE) - highest value byte at lowest address index
    UTF-16 (LE) - lowest value byte at lowest address index
    My only explanation is that 0x004E refers to something other than the address of the wide byte. But I cannot fathom why wprintf only accepts the byte order that it does.

    I may just be having a brain fart, but this confusion has caused me considerable difficulty in dealing with encryption algorithms that need to deal extensively with wchar_t. Your thoughts greatly appreciated.
    mpliam

  2. #2
    Join Date
    Nov 2003
    Posts
    1,902

    Re: How x86 processor and VS 2010 handles wide char arrays endianness

    >> ... 0x004E suggesting big-endianness ...
    That does not suggest or imply any endianness.

    > printf("0x%0.4X ", 0x004E);
    What do you expect to see when running this code on a BE vs LE architecture? You will get "0x004E" on both.

    >> My only explanation is that 0x004E refers to something other than the address of the wide byte.
    You are not printing any addresses. You are printing the value contained in a wchar_t type.

    gg

  3. #3
    Join Date
    Nov 2000
    Location
    Voronezh, Russia
    Posts
    6,620

    Re: How x86 processor and VS 2010 handles wide char arrays endianness

    Quote Originally Posted by Mike Pliam View Post
    What I find confusing is the use of 0x[hi byte][lobyte], e.g., 0x004E suggesting big-endianness
    You contradict to yourself, or mix up two unrelated concepts. 0x004E is a mathematical notation, which is abstract, and due to this fact is entirely endianness unrelated. In mathematical notation hi-order byte always goes first, i.e. 0x004E is always 0x004E being a hexadecimal representation of 78. Endiannes enters the scene when you put the number to some machine readable storage, memory or file.

    Memory layout depends on CPU architecture, e.g. x86 adopts LE scheme, which means in case you need to make x86 CPU to deal with 0x004e two byte number, you have to put the bytes in the following order in memory: 0x4e, 0x00. [BTW, unless it gets to multibyte numbers being read by CPU, the memory layout allowed to be of any scheme, LE or BE, whatever application prefers ]

    File storage is allowed to adopt any scheme, LE or BE, as this is just a chain of bytes that later can be correctly interpreted once you aware of the rule how those were put there. To be able to interpret the byte order unambiguously, a BOM mark is put to the leading bytes of text files, or some other measures get taken for pure binary formats, like default byte order convention, etc. So, in case some file adopts BE scheme and contains 0x004e machine word, its bytes will go in the order 0x00, 0x4e but the word still remain to be 0x004e.

    So being back to your question, neither x86 processor nor VS2010 handle wide char endianness for you, and the handling is all yours.
    Last edited by Igor Vartanov; June 26th, 2013 at 02:29 AM.
    Best regards,
    Igor

  4. #4
    Join Date
    May 2002
    Posts
    1,798

    Re: How x86 processor and VS 2010 handles wide char arrays endianness

    A very lucid and beautiful explanation. Thank you.
    mpliam

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured