writing and reading bytes - a Visual Studio issue ?

**Mike Pliam** · June 26th, 2013, 01:33 PM

I have used the following code to write and read wchar_t bytes from a disk file:

Code:

int WriteBytesW(wchar_t * wcp, int nsz, wchar_t * wcfilepath)
{
	wfstream wf;
    codecvt_utf16<wchar_t, 0x10ffff, little_endian> ccvt(1);
    locale wloc(wf.getloc(), &ccvt);
    wf.imbue(wloc);

	wf.open(wcfilepath, ios::out | ios::binary);
	if(!wf) { wprintf(_T("Unable to open file %s"), wcfilepath); return 0; }
	wf.write((wchar_t *) wcp, (streamsize)(nsz));
	wf.close();

	return 1;

}// WriteBytesW(wchar_t * wcp, int nsz, wchar_t * wcfilepath)

/// reads raw bytes from a file all at once
/// see: http://www.cplusplus.com/reference/istream/istream/tellg/
int ReadBytesW(wchar_t * wcfilepath, wchar_t * pwbuf, long &lsz)
{
	wfstream wf;
    codecvt_utf16<wchar_t, 0x10ffff, little_endian> ccvt(1);
    locale wloc(wf.getloc(), &ccvt);
    wf.imbue(wloc);
	// see:  http://www.codeguru.com/forum/showthread.php?t=511113

	wf.open(wcfilepath, ios::in|ios::binary);
	if(!wf) { wprintf( _T("Unable to open file %s"), wcfilepath); return 0; }
  
   // get length of file:
    wf.seekg (0, wf.end);
    int length = wf.tellg();
    wf.seekg (0, wf.beg);

	lsz = length;

	pwbuf = new wchar_t [length+1];
	wmemset(pwbuf, 0x0000, length+1);

	wf.read(pwbuf, (streamsize) length);
	
	wf.close();

	// print content
	for(int i = 0; i < length/2; i++)
	{
		printf("%0.4X ", pwbuf[i]);
	}
	printf("\n");

	delete [] pwbuf; pwbuf = 0;

	return 1;

}// ReadBytesW(wstring wsfilepath)

I have run this simple experiment where the wide byte 0xFFFF is present or absent.

Code:

int _tmain(int argc, _TCHAR* argv[])
{
	wchar_t wbuf[10];

	wbuf[0] = 0x1234;
	wbuf[1] = 0x5678;
	wbuf[2] = 0x9abc;
	wbuf[3] = 0xef12;
	wbuf[4] = 0xabcd;
	wbuf[5] = 0xfe21;
	wbuf[6] = 0xdcba;
	wbuf[7] = 0x1f2a;
	wbuf[8] = 0xefff;
	wbuf[9] = 0x02ff;

	int n = WriteBytesW(wbuf, 10, _T("bravo.dat"));
	if(n) { printf("save bytes succeeded\n"); } else { printf("save bytes failed\n"); }

	wchar_t * wbuf2 = 0;
	long nsz = 0;
	n = ReadBytesW(_T("bravo.dat"), wbuf2, nsz);
	if(n) { printf("read bytes succeeded\n"); } else { printf("read bytes failed\n"); }

	return 0;

}

Output:

save bytes succeeded
1234 5678 9ABC EF12 ABCD FE21 DCBA 1F2A EFFF 02FF
read bytes succeeded
nsz =: 20

Now, if wbuf[8] = 0xefff; is replaced by wbuf[8] = 0xffff;

Output:

save bytes succeeded
1234 5678 9ABC EF12 ABCD FE21 DCBA 1F2A 02FF
read bytes succeeded
nsz =: 18

Obviously, the 0xffff wbyte is not read. WHY ?

This presents a significant problem when attempting to read ALL wbytes from a file. Is there any work around ? Is this a VS problem ?

**Codeplug** · June 26th, 2013, 02:11 PM

First, you don't need to use the _T() macro. If the parameter expects a "const wchar_t*" then just put an L on the front of the literal

>> Obviously, the 0xffff wbyte is not read. WHY ?
It may have something to do with that fact that U+FFFF is not a value character code point. Have you stepped through it in the debugger? Or do you have Express with no CRT source?

gg

**Arjay** · June 26th, 2013, 02:19 PM

Works okay on 2012 Win7.

1234 5678 9ABC EF12 ABCD FE21 DCBA 1F2A EFFF 02FF

**Codeplug** · June 26th, 2013, 02:24 PM

>> 1234 5678 9ABC EF12 ABCD FE21 DCBA 1F2A EFFF 02FF
Try it with that set to 0xffff.

Also, what is the size of the file? Wondering if write() did the initial filtering.

gg

**Eri523** · June 26th, 2013, 02:50 PM

I could reproduce the issue with VC++ 2010 on XP Pro SP3.

Originally Posted by Codeplug

Also, what is the size of the file? Wondering if write() did the initial filtering.

Your suspicion is right: It's the writing phase where the word gets dropped.

Originally Posted by Codeplug

It may have something to do with that fact that U+FFFF is not a value character code point. [...]

I initially suspected something in that direction as well, but refrained from posting when I saw that the files are opened in binary mode. Can it still be a Unicode (non-)character issue?

[...] Have you stepped through it in the debugger? Or do you have Express with no CRT source?

Express does come with CRT sources, at least the 2010 version.

**Mike Pliam** · June 26th, 2013, 03:30 PM

I've been using Win 7 (64-bit) Ultimate (SvcPk 1) on Dell XPS 8300. Interestingly, if one tries merely to read unsigned char (bytes) from a disk file (even though you must cast the filestream::read( (char*) ...); as (char*), it will read all bytes from 0x00 to 0xFF. Go figure.

Code:

int WriteBytesA(unsigned char uc[], int nz, char * sfilepath)
{
	fstream f;
	f.open(sfilepath, ios::out | ios::binary);
	if(!f) { printf("Unable to open file %s", sfilepath); return 0; }
	f.write((char *) uc, (streamsize)(nz));
	f.close();
	return 1;

}// WriteBytesA(unsigned char * uc, int nz, char * sfilepath)

int ReadBytesA(char * sfilepath, unsigned char * ucbuf, long &lsz)
{
	fstream f;

	f.open(sfilepath, ios::in|ios::binary);
	if(!f) { printf("Unable to open file %s", sfilepath); return 0; }
  
   // get length of file:
    f.seekg (0, f.end);
    long length = (long) f.tellg();
    f.seekg (0, f.beg);

	lsz = length;

	ucbuf = new unsigned char [length+1];
	memset(ucbuf, 0x00, length+1);

	//f.read(ucbuf, (streamsize) length);     // won't accept this
	f.read((char*)ucbuf, (streamsize) length);
	
	f.close();

	// print content
	for(int i = 0; i < length; i++)
	{
		printf("%0.2X ", ucbuf[i]);
	}
	printf("\n");

	delete [] ucbuf; ucbuf = 0;

	return 1;

}// ReadBytesA(char * sfilepath, unsigned char * pucbuf, long &lsz)

int _tmain(int argc, _TCHAR* argv[])
{
	unsigned char uc[22];
	uc[0] = 0x34;
	uc[1] = 0x12;
	uc[3] = 0x34;
	uc[4] = 0x78;
	uc[5] = 0x56;
	uc[6] = 0xbc;
	uc[7] = 0x9a;
	uc[8] = 0x12;
	uc[9] = 0xef;
	uc[10] = 0xcd;
	uc[11] = 0xab;
	uc[12] = 0x21;
	uc[13] = 0xfe;
	uc[14] = 0xba;
	uc[15] = 0xdc;
	uc[16] = 0x2a;
	uc[17] = 0x1f;
	uc[18] = 0xff;
	uc[19] = 0xef;
	uc[20] = 0xff;
	uc[21] = 0x02;

	long nz = 22;
	int n = WriteBytesA(uc, nz, "ssm.dat");
	if(n) { printf("save bytes succeeded\n"); } else { printf("save bytes failed\n"); }
	nz = 0;
	memset(uc, 0x00, 22);
	n = ReadBytesA("ssm.dat", uc, nz);
	if(n) { printf("read bytes succeeded\n"); } else { printf("read bytes failed\n"); }
	printf("nz = %d\n", nz);

	return 0;
}

Output:

save bytes succeeded
34 12 CC 34 78 56 BC 9A 12 EF CD AB 21 FE BA DC 2A 1F FF EF FF 02
read bytes succeeded
nz = 22

It may be that when one attempts to write diverse wbytes to a disk file, the safest method would be to first convert all the wbytes into bytes and then save them as bytes (the latter method above). The following code will accomplish this for you and you can control the 'endianness':

Code:

/// Converts a wide char (wchar_t) array to an unsigned char (byte) array.
/// This routine converts the byte order depending upon the byte order marker.
/// Caller is responsible for allocating and deallocating uc memory.
/// The difference between bigE and littleE is whether the least significant 
/// byte is at the lowest address or not.
/// BOM 
/// UTF-16 (BE)  0xFEFF  - highest value byte at lowest address index
/// UTF-16 (LE)  0xFFFE  - lowest value byte at lowest address index 
int wcstoucs(wchar_t wcs[], int nsz, unsigned char uc[], wchar_t wcbom )
{
	printf("wcbom =: %0.4X\n", wcbom);
	bool bLittleE = false;
	bool bBigE = false;

	if(wcbom == 0xFEFF) { bBigE = true;  printf("big-endian\n");}
	if(wcbom == 0xFFFE) { bLittleE = true; printf("little-endian\n");}

	wchar_t wch = ' ';
	int wdx = 0;
	for(size_t i = 0; i < 2 * nsz; i+=2)
	{
		wch = wcs[wdx];
		if(bBigE)
		{
			uc[i] = LOBYTE(wch);      // bigEndian
			uc[i+1] = HIBYTE(wch);
		}
		if(bLittleE)
		{
			uc[i] = HIBYTE(wch);		// littleEndian (x86)
			uc[i+1] = LOBYTE(wch);
		}
		wdx++;
	}
	return 2 * nsz;

}//  wcstoucs(wchar_t wcs[], int nsz, unsigned char uc[], wchar_t wcbom )

It occurs to me that the problem saving wide chars to disk might be a bug.

**S_M_A** · June 26th, 2013, 05:09 PM

I think Codeplug nailed it. Single stepping the code into CRT shows that 0xFFFF is the wide EOF character so your workaround isn't the way to go.

**Philip Nicoletti** · June 26th, 2013, 07:41 PM

I am thinking it is a bug in wfstream ... you can use fstream instead.
Below is your code changed to fstream with a few other minor cosmetic
changes:

Code:

// win32_console.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"

#include <cstdio>
#include <fstream>
#include <codecvt>

using namespace std;

int WriteBytesW(wchar_t * wcp, int nsz, wchar_t * wcfilepath)
{
    ofstream wf(wcfilepath, ios::out | ios::binary);
    if(!wf) { wprintf(L"Unable to open file %s", wcfilepath); return 0; }

    codecvt_utf16<wchar_t, 0x10ffff, little_endian> ccvt(1);
    locale wloc(wf.getloc(), &ccvt);
    wf.imbue(wloc);

    wf.write(reinterpret_cast<char*>(wcp),nsz);

    return 1;
}// WriteBytesW(wchar_t * wcp, int nsz, wchar_t * wcfilepath)

/// reads raw bytes from a file all at once
/// see: http://www.cplusplus.com/reference/istream/istream/tellg/
int ReadBytesW(wchar_t * wcfilepath, wchar_t * & pwbuf, long &lsz)
{
    ifstream wf(wcfilepath, ios::in|ios::binary);
    if(!wf) { wprintf( L"Unable to open file %s", wcfilepath); return 0; }

    codecvt_utf16<wchar_t, 0x10ffff, little_endian> ccvt(1);
    locale wloc(wf.getloc(), &ccvt);
    wf.imbue(wloc);
    // see:  http://www.codeguru.com/forum/showthread.php?t=511113

   // get length of file:
    wf.seekg (0, ios::end);
    int length = wf.tellg();
    wf.seekg (0, ios::beg);

    lsz = length;

    pwbuf = new wchar_t [length+1];
    wmemset(pwbuf, 0x0000, length+1);

    wf.read((char*)pwbuf, (streamsize) length);
    
    wf.close();

    printf("length = %d\n",length);

    // print content
    for(int i = 0; i < length/2; i++)
    {
        printf("%d : %0.4X \n", i,pwbuf[i]);
    }
    printf("\n");

    delete [] pwbuf; pwbuf = 0;

    return 1;

}// ReadBytesW(wstring wsfilepath)

int _tmain(int argc, _TCHAR* argv[])
{
    wchar_t wbuf[10];

    wbuf[0] = 0x1234;
    wbuf[1] = 0x5678;
    wbuf[2] = 0x9abc;
    wbuf[3] = 0xef12;
    wbuf[4] = 0xabcd;
    wbuf[5] = 0xfe21;
    wbuf[6] = 0xdcba;
    wbuf[7] = 0x1f2a;
    wbuf[8] = 0xffff;
    wbuf[9] = 0x02ff;

    int n = WriteBytesW(wbuf, 10*sizeof(wchar_t), L"bravo.dat");
    if(n) { printf("save bytes succeeded\n"); } else { printf("save bytes failed\n"); }

    wchar_t * wbuf2 = 0;
    long nsz = 0;
    n = ReadBytesW(L"bravo.dat", wbuf2, nsz);
    if(n) { printf("read bytes succeeded\n"); } else { printf("read bytes failed\n"); }

    return 0;
}

**2kaud** · June 27th, 2013, 05:13 AM

Originally Posted by S_M_A

I think Codeplug nailed it. Single stepping the code into CRT shows that 0xFFFF is the wide EOF character so your workaround isn't the way to go.

But as the file is being opened in binary mode, should the bit contents of the file or what is being read/written make any difference? I'm inclined to agree with Philip that it looks like a bug in wfstream when in binary mode.

**superbonzo** · June 27th, 2013, 09:22 AM

Originally Posted by 2kaud

But as the file is being opened in binary mode, should the bit contents of the file or what is being read/written make any difference? I'm inclined to agree with Philip that it looks like a bug in wfstream when in binary mode.

no, the binary mode doesn't change the fact that the streams will read/write elements of type char_type interpreted according to the corresponding char_traits specialization. In this case, the choice of the implementation to define char_traits<wchar_t>::int_type as an unsigned short ( with 0xFFFF used as EOF ) looks legitimate to me. Moreover, I have no experience with char sets, but the resulting behavior of "ignoring" the non-character 0xffff is consistent with the UTF-16 spec, isn't it ?

Thread: writing and reading bytes - a Visual Studio issue ?

Thread Tools

Display

writing and reading bytes - a Visual Studio issue ?

Re: writing and reading bytes - a Visual Studio issue ?

Re: writing and reading bytes - a Visual Studio issue ?

Re: writing and reading bytes - a Visual Studio issue ?

Re: writing and reading bytes - a Visual Studio issue ?

Re: writing and reading bytes - a Visual Studio issue ?

Re: writing and reading bytes - a Visual Studio issue ?

Re: writing and reading bytes - a Visual Studio issue ?

Re: writing and reading bytes - a Visual Studio issue ?

Re: writing and reading bytes - a Visual Studio issue ?

Posting Permissions