CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 10 of 10
  1. #1
    Join Date
    May 2002
    Posts
    1,798

    writing and reading bytes - a Visual Studio issue ?

    I have used the following code to write and read wchar_t bytes from a disk file:

    Code:
    int WriteBytesW(wchar_t * wcp, int nsz, wchar_t * wcfilepath)
    {
    	wfstream wf;
        codecvt_utf16<wchar_t, 0x10ffff, little_endian> ccvt(1);
        locale wloc(wf.getloc(), &ccvt);
        wf.imbue(wloc);
    
    	wf.open(wcfilepath, ios::out | ios::binary);
    	if(!wf) { wprintf(_T("Unable to open file %s"), wcfilepath); return 0; }
    	wf.write((wchar_t *) wcp, (streamsize)(nsz));
    	wf.close();
    
    	return 1;
    
    }// WriteBytesW(wchar_t * wcp, int nsz, wchar_t * wcfilepath)
    
    /// reads raw bytes from a file all at once
    /// see: http://www.cplusplus.com/reference/istream/istream/tellg/
    int ReadBytesW(wchar_t * wcfilepath, wchar_t * pwbuf, long &lsz)
    {
    	wfstream wf;
        codecvt_utf16<wchar_t, 0x10ffff, little_endian> ccvt(1);
        locale wloc(wf.getloc(), &ccvt);
        wf.imbue(wloc);
    	// see:  http://www.codeguru.com/forum/showthread.php?t=511113
    
    	wf.open(wcfilepath, ios::in|ios::binary);
    	if(!wf) { wprintf( _T("Unable to open file %s"), wcfilepath); return 0; }
      
       // get length of file:
        wf.seekg (0, wf.end);
        int length = wf.tellg();
        wf.seekg (0, wf.beg);
    
    	lsz = length;
    
    	pwbuf = new wchar_t [length+1];
    	wmemset(pwbuf, 0x0000, length+1);
    
    	wf.read(pwbuf, (streamsize) length);
    	
    	wf.close();
    
    	// print content
    	for(int i = 0; i < length/2; i++)
    	{
    		printf("%0.4X ", pwbuf[i]);
    	}
    	printf("\n");
    
    	delete [] pwbuf; pwbuf = 0;
    
    	return 1;
    
    }// ReadBytesW(wstring wsfilepath)
    I have run this simple experiment where the wide byte 0xFFFF is present or absent.
    Code:
    int _tmain(int argc, _TCHAR* argv[])
    {
    	wchar_t wbuf[10];
    
    	wbuf[0] = 0x1234;
    	wbuf[1] = 0x5678;
    	wbuf[2] = 0x9abc;
    	wbuf[3] = 0xef12;
    	wbuf[4] = 0xabcd;
    	wbuf[5] = 0xfe21;
    	wbuf[6] = 0xdcba;
    	wbuf[7] = 0x1f2a;
    	wbuf[8] = 0xefff;
    	wbuf[9] = 0x02ff;
    
    	int n = WriteBytesW(wbuf, 10, _T("bravo.dat"));
    	if(n) { printf("save bytes succeeded\n"); } else { printf("save bytes failed\n"); }
    
    	wchar_t * wbuf2 = 0;
    	long nsz = 0;
    	n = ReadBytesW(_T("bravo.dat"), wbuf2, nsz);
    	if(n) { printf("read bytes succeeded\n"); } else { printf("read bytes failed\n"); }
    
    	return 0;
    
    }
    Output:
    save bytes succeeded
    1234 5678 9ABC EF12 ABCD FE21 DCBA 1F2A EFFF 02FF
    read bytes succeeded
    nsz =: 20
    Now, if wbuf[8] = 0xefff; is replaced by wbuf[8] = 0xffff;

    Output:
    save bytes succeeded
    1234 5678 9ABC EF12 ABCD FE21 DCBA 1F2A 02FF
    read bytes succeeded
    nsz =: 18
    Obviously, the 0xffff wbyte is not read. WHY ?

    This presents a significant problem when attempting to read ALL wbytes from a file. Is there any work around ? Is this a VS problem ?
    mpliam

  2. #2
    Join Date
    Nov 2003
    Posts
    1,902

    Re: writing and reading bytes - a Visual Studio issue ?

    First, you don't need to use the _T() macro. If the parameter expects a "const wchar_t*" then just put an L on the front of the literal

    >> Obviously, the 0xffff wbyte is not read. WHY ?
    It may have something to do with that fact that U+FFFF is not a value character code point. Have you stepped through it in the debugger? Or do you have Express with no CRT source?

    gg

  3. #3
    Arjay's Avatar
    Arjay is offline Moderator / EX MS MVP Power Poster
    Join Date
    Aug 2004
    Posts
    13,490

    Re: writing and reading bytes - a Visual Studio issue ?

    Works okay on 2012 Win7.

    1234 5678 9ABC EF12 ABCD FE21 DCBA 1F2A EFFF 02FF

  4. #4
    Join Date
    Nov 2003
    Posts
    1,902

    Re: writing and reading bytes - a Visual Studio issue ?

    >> 1234 5678 9ABC EF12 ABCD FE21 DCBA 1F2A EFFF 02FF
    Try it with that set to 0xffff.

    Also, what is the size of the file? Wondering if write() did the initial filtering.

    gg

  5. #5
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,675

    Re: writing and reading bytes - a Visual Studio issue ?

    I could reproduce the issue with VC++ 2010 on XP Pro SP3.

    Quote Originally Posted by Codeplug View Post
    Also, what is the size of the file? Wondering if write() did the initial filtering.
    Your suspicion is right: It's the writing phase where the word gets dropped.

    Quote Originally Posted by Codeplug View Post
    It may have something to do with that fact that U+FFFF is not a value character code point. [...]
    I initially suspected something in that direction as well, but refrained from posting when I saw that the files are opened in binary mode. Can it still be a Unicode (non-)character issue?

    [...] Have you stepped through it in the debugger? Or do you have Express with no CRT source?
    Express does come with CRT sources, at least the 2010 version.
    I was thrown out of college for cheating on the metaphysics exam; I looked into the soul of the boy sitting next to me.

    This is a snakeskin jacket! And for me it's a symbol of my individuality, and my belief... in personal freedom.

  6. #6
    Join Date
    May 2002
    Posts
    1,798

    Re: writing and reading bytes - a Visual Studio issue ?

    I've been using Win 7 (64-bit) Ultimate (SvcPk 1) on Dell XPS 8300. Interestingly, if one tries merely to read unsigned char (bytes) from a disk file (even though you must cast the filestream::read( (char*) ...); as (char*), it will read all bytes from 0x00 to 0xFF. Go figure.

    Code:
    int WriteBytesA(unsigned char uc[], int nz, char * sfilepath)
    {
    	fstream f;
    	f.open(sfilepath, ios::out | ios::binary);
    	if(!f) { printf("Unable to open file %s", sfilepath); return 0; }
    	f.write((char *) uc, (streamsize)(nz));
    	f.close();
    	return 1;
    
    }// WriteBytesA(unsigned char * uc, int nz, char * sfilepath)
    
    int ReadBytesA(char * sfilepath, unsigned char * ucbuf, long &lsz)
    {
    	fstream f;
    
    	f.open(sfilepath, ios::in|ios::binary);
    	if(!f) { printf("Unable to open file %s", sfilepath); return 0; }
      
       // get length of file:
        f.seekg (0, f.end);
        long length = (long) f.tellg();
        f.seekg (0, f.beg);
    
    	lsz = length;
    
    	ucbuf = new unsigned char [length+1];
    	memset(ucbuf, 0x00, length+1);
    
    	//f.read(ucbuf, (streamsize) length);     // won't accept this
    	f.read((char*)ucbuf, (streamsize) length);
    	
    	f.close();
    
    	// print content
    	for(int i = 0; i < length; i++)
    	{
    		printf("%0.2X ", ucbuf[i]);
    	}
    	printf("\n");
    
    	delete [] ucbuf; ucbuf = 0;
    
    	return 1;
    
    }// ReadBytesA(char * sfilepath, unsigned char * pucbuf, long &lsz)
    
    int _tmain(int argc, _TCHAR* argv[])
    {
    	unsigned char uc[22];
    	uc[0] = 0x34;
    	uc[1] = 0x12;
    	uc[3] = 0x34;
    	uc[4] = 0x78;
    	uc[5] = 0x56;
    	uc[6] = 0xbc;
    	uc[7] = 0x9a;
    	uc[8] = 0x12;
    	uc[9] = 0xef;
    	uc[10] = 0xcd;
    	uc[11] = 0xab;
    	uc[12] = 0x21;
    	uc[13] = 0xfe;
    	uc[14] = 0xba;
    	uc[15] = 0xdc;
    	uc[16] = 0x2a;
    	uc[17] = 0x1f;
    	uc[18] = 0xff;
    	uc[19] = 0xef;
    	uc[20] = 0xff;
    	uc[21] = 0x02;
    
    	long nz = 22;
    	int n = WriteBytesA(uc, nz, "ssm.dat");
    	if(n) { printf("save bytes succeeded\n"); } else { printf("save bytes failed\n"); }
    	nz = 0;
    	memset(uc, 0x00, 22);
    	n = ReadBytesA("ssm.dat", uc, nz);
    	if(n) { printf("read bytes succeeded\n"); } else { printf("read bytes failed\n"); }
    	printf("nz = %d\n", nz);
    
    	return 0;
    }
    Output:
    save bytes succeeded
    34 12 CC 34 78 56 BC 9A 12 EF CD AB 21 FE BA DC 2A 1F FF EF FF 02
    read bytes succeeded
    nz = 22
    It may be that when one attempts to write diverse wbytes to a disk file, the safest method would be to first convert all the wbytes into bytes and then save them as bytes (the latter method above). The following code will accomplish this for you and you can control the 'endianness':

    Code:
    /// Converts a wide char (wchar_t) array to an unsigned char (byte) array.
    /// This routine converts the byte order depending upon the byte order marker.
    /// Caller is responsible for allocating and deallocating uc memory.
    /// The difference between bigE and littleE is whether the least significant 
    /// byte is at the lowest address or not.
    /// BOM 
    /// UTF-16 (BE)  0xFEFF  - highest value byte at lowest address index
    /// UTF-16 (LE)  0xFFFE  - lowest value byte at lowest address index 
    int wcstoucs(wchar_t wcs[], int nsz, unsigned char uc[], wchar_t wcbom )
    {
    	printf("wcbom =: %0.4X\n", wcbom);
    	bool bLittleE = false;
    	bool bBigE = false;
    
    	if(wcbom == 0xFEFF) { bBigE = true;  printf("big-endian\n");}
    	if(wcbom == 0xFFFE) { bLittleE = true; printf("little-endian\n");}
    
    	wchar_t wch = ' ';
    	int wdx = 0;
    	for(size_t i = 0; i < 2 * nsz; i+=2)
    	{
    		wch = wcs[wdx];
    		if(bBigE)
    		{
    			uc[i] = LOBYTE(wch);      // bigEndian
    			uc[i+1] = HIBYTE(wch);
    		}
    		if(bLittleE)
    		{
    			uc[i] = HIBYTE(wch);		// littleEndian (x86)
    			uc[i+1] = LOBYTE(wch);
    		}
    		wdx++;
    	}
    	return 2 * nsz;
    
    }//  wcstoucs(wchar_t wcs[], int nsz, unsigned char uc[], wchar_t wcbom )
    It occurs to me that the problem saving wide chars to disk might be a bug.
    Last edited by Mike Pliam; June 26th, 2013 at 03:38 PM.
    mpliam

  7. #7
    Join Date
    Oct 2006
    Location
    Sweden
    Posts
    3,654

    Re: writing and reading bytes - a Visual Studio issue ?

    I think Codeplug nailed it. Single stepping the code into CRT shows that 0xFFFF is the wide EOF character so your workaround isn't the way to go.
    Debugging is twice as hard as writing the code in the first place.
    Therefore, if you write the code as cleverly as possible, you are, by
    definition, not smart enough to debug it.
    - Brian W. Kernighan

    To enhance your chance's of getting an answer be sure to read
    http://www.codeguru.com/forum/announ...nouncementid=6
    and http://www.codeguru.com/forum/showthread.php?t=366302 before posting

    Refresh your memory on formatting tags here
    http://www.codeguru.com/forum/misc.php?do=bbcode

    Get your free MS compiler here
    https://visualstudio.microsoft.com/vs

  8. #8
    Join Date
    Aug 2000
    Location
    West Virginia
    Posts
    7,721

    Re: writing and reading bytes - a Visual Studio issue ?

    I am thinking it is a bug in wfstream ... you can use fstream instead.
    Below is your code changed to fstream with a few other minor cosmetic
    changes:

    Code:
    // win32_console.cpp : Defines the entry point for the console application.
    //
    
    #include "stdafx.h"
    
    #include <cstdio>
    #include <fstream>
    #include <codecvt>
    
    using namespace std;
    
    int WriteBytesW(wchar_t * wcp, int nsz, wchar_t * wcfilepath)
    {
        ofstream wf(wcfilepath, ios::out | ios::binary);
        if(!wf) { wprintf(L"Unable to open file %s", wcfilepath); return 0; }
    
        codecvt_utf16<wchar_t, 0x10ffff, little_endian> ccvt(1);
        locale wloc(wf.getloc(), &ccvt);
        wf.imbue(wloc);
    
        wf.write(reinterpret_cast<char*>(wcp),nsz);
    
        return 1;
    }// WriteBytesW(wchar_t * wcp, int nsz, wchar_t * wcfilepath)
    
    /// reads raw bytes from a file all at once
    /// see: http://www.cplusplus.com/reference/istream/istream/tellg/
    int ReadBytesW(wchar_t * wcfilepath, wchar_t * & pwbuf, long &lsz)
    {
        ifstream wf(wcfilepath, ios::in|ios::binary);
        if(!wf) { wprintf( L"Unable to open file %s", wcfilepath); return 0; }
    
        codecvt_utf16<wchar_t, 0x10ffff, little_endian> ccvt(1);
        locale wloc(wf.getloc(), &ccvt);
        wf.imbue(wloc);
        // see:  http://www.codeguru.com/forum/showthread.php?t=511113
    
       // get length of file:
        wf.seekg (0, ios::end);
        int length = wf.tellg();
        wf.seekg (0, ios::beg);
    
        lsz = length;
    
        pwbuf = new wchar_t [length+1];
        wmemset(pwbuf, 0x0000, length+1);
    
        wf.read((char*)pwbuf, (streamsize) length);
        
        wf.close();
    
        printf("length = %d\n",length);
    
        // print content
        for(int i = 0; i < length/2; i++)
        {
            printf("%d : %0.4X \n", i,pwbuf[i]);
        }
        printf("\n");
    
        delete [] pwbuf; pwbuf = 0;
    
        return 1;
    
    }// ReadBytesW(wstring wsfilepath)
    
    int _tmain(int argc, _TCHAR* argv[])
    {
        wchar_t wbuf[10];
    
        wbuf[0] = 0x1234;
        wbuf[1] = 0x5678;
        wbuf[2] = 0x9abc;
        wbuf[3] = 0xef12;
        wbuf[4] = 0xabcd;
        wbuf[5] = 0xfe21;
        wbuf[6] = 0xdcba;
        wbuf[7] = 0x1f2a;
        wbuf[8] = 0xffff;
        wbuf[9] = 0x02ff;
    
        int n = WriteBytesW(wbuf, 10*sizeof(wchar_t), L"bravo.dat");
        if(n) { printf("save bytes succeeded\n"); } else { printf("save bytes failed\n"); }
    
        wchar_t * wbuf2 = 0;
        long nsz = 0;
        n = ReadBytesW(L"bravo.dat", wbuf2, nsz);
        if(n) { printf("read bytes succeeded\n"); } else { printf("read bytes failed\n"); }
    
        return 0;
    }

  9. #9
    2kaud's Avatar
    2kaud is offline Super Moderator Power Poster
    Join Date
    Dec 2012
    Location
    England
    Posts
    7,822

    Re: writing and reading bytes - a Visual Studio issue ?

    Quote Originally Posted by S_M_A View Post
    I think Codeplug nailed it. Single stepping the code into CRT shows that 0xFFFF is the wide EOF character so your workaround isn't the way to go.
    But as the file is being opened in binary mode, should the bit contents of the file or what is being read/written make any difference? I'm inclined to agree with Philip that it looks like a bug in wfstream when in binary mode.
    All advice is offered in good faith only. All my code is tested (unless stated explicitly otherwise) with the latest version of Microsoft Visual Studio (using the supported features of the latest standard) and is offered as examples only - not as production quality. I cannot offer advice regarding any other c/c++ compiler/IDE or incompatibilities with VS. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/ and can be used without reference or acknowledgement. Also note that I only provide advice and guidance via the forums - and not via private messages!

    C++23 Compiler: Microsoft VS2022 (17.6.5)

  10. #10
    Join Date
    Oct 2008
    Posts
    1,456

    Re: writing and reading bytes - a Visual Studio issue ?

    Quote Originally Posted by 2kaud View Post
    But as the file is being opened in binary mode, should the bit contents of the file or what is being read/written make any difference? I'm inclined to agree with Philip that it looks like a bug in wfstream when in binary mode.
    no, the binary mode doesn't change the fact that the streams will read/write elements of type char_type interpreted according to the corresponding char_traits specialization. In this case, the choice of the implementation to define char_traits<wchar_t>::int_type as an unsigned short ( with 0xFFFF used as EOF ) looks legitimate to me. Moreover, I have no experience with char sets, but the resulting behavior of "ignoring" the non-character 0xffff is consistent with the UTF-16 spec, isn't it ?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured