Reading ASCII Strings into a Unicode Program

**wdolson** · December 13th, 2011, 03:33 AM

I am developing a new app in MFC under VC 2005. Part of the program has to deal with text from files that are used by some old ASCII programs (the programs were created before unicode existed).

Some of these strings are in an INI file. I attempted to read these in using GetPrivateProfileStringA which is supposed to read in the values to char strings which I convert on use to unicode with the A2W macro.

I was successful in doing this from a binary file, but I'm getting weird behavior from the INI file. (In the binary file I parsed it with a custom parser written for this program as it has a unique file format.)

A couple of times I had the INI file read correctly, but then the A2W macro returned NULL when I did the conversion. Most of the time GetPrivateProfileStringA returns odd values. The first string usually had three characters of 0xcd then the ASCII string. Other strings end up having leading NULLs equal to the number of characters read. For example, if it reads the word "Green", the char array will contain:

0x00, 0x00, 0x00, 0x00, 0x00, 'G', 'r', 'e', 'e', 'n', 0x00

upon return and GetPrivateProfileStringA will return 5.

GetPrivateProfileStringW doesn't work. For the first string, it returns three 0xcd bytes followed by the ASCII string with 0x00 padding, but shifted up a byte due to the three 0xcds at the start.

The old ASCII programs make heavy use of GetPrivateProfileString and it doesn't have any problems. It appears that using the ASCII version in a unicode program causes problems. Am I going to have to write a custom parser to get this data out of the INI file? I did some searching and can't find any information on how to read ASCII strings into a unicode program.

I would think this would be a fairly common problem. I must be searching with the wrong terms.

I don't really care when the strings are converted to unicode. I can do it manually after reading them in, or if they can be done at read in time, that's fine too. I just need it to work.

**VictorN** · December 13th, 2011, 04:34 AM

Originally Posted by wdolson

A couple of times I had the INI file read correctly, but then the A2W macro returned NULL when I did the conversion.

Could you show your code with "A2W macro returned NULL"?

Originally Posted by wdolson

Most of the time GetPrivateProfileStringA returns odd values. The first string usually had three characters of 0xcd then the ASCII string.

0xcd in DEbUG build means uninitialized character. What exactly is the string you are reading from .ini file? Is .ini file an ANSI or UNICODE? (Just open it in binary editor and you will see it!)

Originally Posted by wdolson

Other strings end up having leading NULLs equal to the number of characters read. For example, if it reads the word "Green", the char array will contain:

0x00, 0x00, 0x00, 0x00, 0x00, 'G', 'r', 'e', 'e', 'n', 0x00

upon return and GetPrivateProfileStringA will return 5....

Again: what is the exact string in .ini file (how it looks in a binary editor)?

**wdolson** · December 13th, 2011, 05:12 AM

Originally Posted by VictorN

Could you show your code with "A2W macro returned NULL"?

In the document file (the INI file data is read in CMainFrame at program start)

CString filename = A2W(pFrame->m_CfgData.PinNamePath);

The function includes

USES_CONVERSION;

The string that is getting the 0xcd's is a different string from this one. This one has leading 0x00's.

0xcd in DEbUG build means uninitialized character. What exactly is the string you are reading from .ini file? Is .ini file an ANSI or UNICODE? (Just open it in binary editor and you will see it!)

The file is ANSII and needs to remain that way because there are programs that use it that are ANSII.

I zero out m_CfgData before I use it, so I don't know where the 0xcds are coming from.

The string from the INI file is:
[MTLAUNCH]
DataPath = E:\C_Proj\Robson\110111USBChanges\DEBUG

Again: what is the exact string in .ini file (how it looks in a binary editor)?

My editor won't copy and paste in hex mode (an old version of UltraEdit). It's definitely ASCII though, not a 00 in the entire file.

The code to read the INI is below. The default value path looks fine in the debugger and this is reading in the string, just with leading characters that are garbage. I have a couple of GetPrivateProfileIntA calls too. Those also get the wrong values.

Code:

    int charsread;
    char path[256];
    char iniFile[256];
    int len=255;

    memset(&m_CfgData,0,sizeof(m_CfgData));
    ::GetCurrentDirectoryA(len,path);

    strncpy_s(iniFile,path,len);
    strncat_s(iniFile,"\\",2);
    strncat_s(iniFile,MTCFGFILE,len);
    charsread = GetPrivateProfileStringA("MTLAUNCH", "DataPath", path, m_CfgData.DataPath, MAXPATH, iniFile);

**wdolson** · December 13th, 2011, 06:38 AM

I found this class which seems to do the job:

http://www.codeproject.com/KB/files/...097&fr=1#xx0xx

I need to do some surgery on my code to fit in the changes, but it gets the ASCII string and converts it to unicode for me, so that's one less step to deal with.

Bill

**Paul McKenzie** · December 13th, 2011, 08:32 AM

Originally Posted by wdolson

I zero out m_CfgData before I use it, so I don't know where the 0xcds are coming from.

Code:

memset(&m_CfgData,0,sizeof(m_CfgData));

Please tell us how m_CfgData is declared. This line of code looks wrong. If it's an array, it definitely is wrong. Look at what is highlighted in red -- if it's an array, get rid of the "&".

If it's wrong, you weren't zeroing out the array, you were zeroing out the address of where the array is declared, not the contents of the array.

Regards,

Paul McKenzie

**wdolson** · December 13th, 2011, 09:53 PM

m_CfgData is a struct with several strings and some ints. I've tried defining it a few different ways.

I removed the memset for now. Removing the & from in front of the m_CfgData won't compile. The function wants a pointer to the structure.

I'm using a completely different way to read the INI file now and I'm still getting weird stuff. I tried defining the strings as CStrings, but when I went to use them, if I moused over them, the debugger says <Bad Pointer>. If I expand the + box on the tool tip, it says ATL::CSimpleStringT <wchar_t,0>. I haven't seen that before, but I don't usually mouse over CStrings, they just tend to work most of the time.

I experimented declaring DataPath as a TCHAR array

Code:

typedef struct
{
    //CString DataPath;
    TCHAR DataPath[MAXPATH];
    int  DrvType;
    int  DrvPort;
    CString ConfigPath;
    CString MapPath;
    CString PinNamePath;
    CString SocketPath;
    CString CalPath;
    CString clrBackground;
    CString clrGrid;
    int  MeasureSpec;
    int  SuperMeasure;
}IniData;

I'm getting very weird behavior. l looked at DataPath in the memory view window and it, of course comes up as all 0xcd. Then I did

m_CfgData.DataPath[0] = 0L;

This set the 4th and 5th bytes in the string to 0. The first three bytes are still 0xcd. It appears for writing purposes, the first entry starts at the 4th byte, but for reading, the first entry starts at the first byte.

I'm seeing this same thing when I try to write a string to the variable. The first three bytes are unchanged, but everything that reads the string sees 0xcd as the first character.

This is completely bizarre. I've been writing C and C++ for over 20 years and have never seen anything like this. I have not worked all that much with multibyte character sets, but I don't think this is working the way it's supposed to.

Bill

**Paul McKenzie** · December 14th, 2011, 01:11 AM

Originally Posted by wdolson

m_CfgData is a struct with several strings and some ints. I've tried defining it a few different ways.

I removed the memset for now. Removing the & from in front of the m_CfgData won't compile. The function wants a pointer to the structure.

You cannot call memset() on a structure that is not POD.

I'm using a completely different way to read the INI file now and I'm still getting weird stuff. I tried defining the strings as CStrings, but when I went to use them, if I moused over them, the debugger says <Bad Pointer>. If I expand the + box on the tool tip, it says ATL::CSimpleStringT <wchar_t,0>. I haven't seen that before, but I don't usually mouse over CStrings, they just tend to work most of the time.

The problem is simple -- you think that C++ is 'C'.

Your structure contains CStrings -- you cannot use low-level C functions to manipulate or initialize these structures. You must properly construct and initialize these objects. Functions such as memset(), memcpy(), ZeroMemory(), etc. just blindly go through your objects and rips them to shreds. They know nothing about v-tables, destructors, constructors, etc.

In C++, you have two basic types, POD (Plain Old Data) and non-POD. The POD types are compatible with 'C' types. It's the non-POD types that cannot be treated as 'C' types. Example:

Code:

CString s;
memset(s, 0, sizeof(CString));

This code, which is basically what you tried to do, is absolutely no good. That's why you're getting all sorts of weird behaviour -- you're treating C++ types as 'C' types.

Regards,

Paul McKenzie

**wdolson** · December 14th, 2011, 01:15 AM

I tried moving the code to the document, and it loads OK. There is something weird with loading it in the main frame. I found I could get the strings to load if I created a TCHAR string at the top of the struct and set element 0 to 0L, but again it set the 4th and 5th byte, not the 1st two bytes.

Then when I tried to use the strings in the document, by just copying one CString to another, the recipient CString didn't get anything. The = operator did nothing.

It works OK when I put the code in the document, but I can't leave it there. Loading a copy every time the document loads is a waste of space.

Where would be a good place to load this if the main frame won't work?

Bill

**Paul McKenzie** · December 14th, 2011, 01:19 AM

Originally Posted by wdolson

I tried moving the code to the document, and it loads OK. There is something weird with loading it in the main frame.

Again, you cannot write code as you claim you wrote and expect correct behaviour.

Then when I tried to use the strings in the document, by just copying one CString to another, the recipient CString didn't get anything. The = operator did nothing.

That is because you "ripped it to shreds" by zeroing out the CString's guts calling memset(), memcpy(), what have you.

You completely trashed the internal reference counting system used by CString. The reference counting system of CString is responsible for copying CString's correctly while using a single buffer, and memset() or memcpy() called on CStrings or structs/classes that contains CStrings will make reference counting not work. Once you do that, your program isn't going to work, or it will work erratically.

So it doesn't make any difference what you're trying to do if you're writing code that leads to undefined behaviour.

Regards,

Paul McKenzie

**wdolson** · December 14th, 2011, 01:49 AM

Moving the loading code to the app class works.

I am still wondering why it wouldn't work right in the main frame.

Bill

**Paul McKenzie** · December 14th, 2011, 02:04 AM

Originally Posted by wdolson

Moving the loading code to the app class works.

I am still wondering why it wouldn't work right in the main frame.

Bill

And again, if you're writing code that leads to undefined behaviour, there is no guarantee if that code is actually working, or if you're just lucky that it hasn't gone down in flames.

If it's an MFC-based architectural reason why the code doesn't work, that's one story -- but I am not convinced of this, as your previous posts shows that you were doing things incorrectly with respect to handling non-POD types.

Regards,

Paul McKenzie

**Paul McKenzie** · December 14th, 2011, 02:19 AM

This is what I'm saying -- let's go back to your post here:

Code:

typedef struct
{
    TCHAR DataPath[MAXPATH];
    int  DrvType;
    int  DrvPort;
    CString ConfigPath;
    CString MapPath;
    CString PinNamePath;
    CString SocketPath;
    CString CalPath;
    CString clrBackground;
    CString clrGrid;
    int  MeasureSpec;
    int  SuperMeasure;
}IniData;

OK.

First, there is no need for "typedef struct" in a C++ program. That is a holdover from 'C'. Even though it isn't fatal to do the "typedef struct", it is a sign of things to come that are not good, i.e. 'C' coding done on a C++ type.

Code:

struct IniData
{
    TCHAR DataPath[MAXPATH];
    int  DrvType;
    int  DrvPort;
    CString ConfigPath;
    CString MapPath;
    CString PinNamePath;
    CString SocketPath;
    CString CalPath;
    CString clrBackground;
    CString clrGrid;
    int  MeasureSpec;
    int  SuperMeasure;

    IniData() : DrvType(0), DrvPort(0), MeasureSpec(0), SuperMeasure(0)
    { memset(DataPath, 0, sizeof(DataPath); }
};

Now, the structure is initialized since it now has a default constructor. You do not use memset() to initialize this structure. Now with the default constructor, all fields have data, and the array is filled with 0 when I do this:

Code:

IniPath iPath;  // no memset() -- the object has been initialized using the constructor.

I can use memset() but only on the internal TCHAR array, since the array is a POD type. I could have been more safe by using std::fill, as fill() works on POD and non-POD types:

Code:

#include <algorithm>
//....
struct IniData
{
//...
    IniData() : DrvType(0), DrvPort(0), MeasureSpec(0), SuperMeasure(0)
    { std::fill(DataPath, DataPath + MAXPATH, 0); }
};

Copying: When you copy an IniPath to another, you don't use memcpy() or any similar function. Do this instead:

Code:

IniPath iPath;
IniPath iPath2 = iPath;
//or
IniPath iPath3;
iPath = iPath3;

This is what is expected when handling non-POD types such as IniStruct.

Regards,

Paul McKenzie

**wdolson** · December 14th, 2011, 05:06 AM

Thanks for the info. I learned C pretty much formally, but taught myself C++ on the job. There are some gaps in my education.

One thing, I wasn't talking about making a copy of IniPath, just a copy of one of the CStrings in it that was already set with data.

For example,

m_CfgData.PinNamePath is set with data when the INI is read. Then later when that path is used, I do this:

CString filename = pFrame->m_CfgData.PinNamePath;

When the m_CfgData structure is filled within the MainFrame, the above setting of filename results in a NULL string, even though I can look at the string in m_CfgData.PinNamePath and it looks correct.

When I move the code to load data into m_CfgData to the app class, or the document class, setting filename works properly with no problems.

My conclusion is there is something odd about putting this sort of thing in the main frame because it works as expected everywhere else.

I will add the initialization code as you suggest just to be proper and to make sure everything starts in a known state.

Bill

**VictorN** · December 14th, 2011, 05:12 AM

Originally Posted by wdolson

...
When I move the code to load data into m_CfgData to the app class, or the document class, setting filename works properly with no problems.

My conclusion is there is something odd about putting this sort of thing in the main frame because it works as expected everywhere else.

I will add the initialization code as you suggest just to be proper and to make sure everything starts in a known state.

Please, don't do any code movements until your buggy code won't be fixed. Those movements won't help you at all. Never.

**Paul McKenzie** · December 14th, 2011, 05:28 AM

Originally Posted by wdolson

Thanks for the info. I learned C pretty much formally, but taught myself C++ on the job. There are some gaps in my education.

That can be an issue if you learned 'C' first. The problem is that there are things you can do in 'C' that you cannot do in C++, regardless of how safe or "common sense" it may look.

Basically, 'C' has only one basic type -- POD (Plain Old Data). Everything in 'C' is a POD type -- doubles, ints, pointers, arrays, structs, everything. This means you can call 'C' functions to manipulate these types in one shot, functions such as memset(), memcpy(), ZeroMemory(), malloc(), free(), etc. because there is nothing special or hidden behind the scenes about these types.

However for C++, you have a second basic type, and those are simply non-POD types. These types have user-defined constructors, destructors, copy constructors, virtual functions, or derived from a base class, etc. In other words, there is no compatible equivalent in 'C' for these types. CString is one such non-POD type, and any struct or class that has CString automatically becomes non-POD.

This means these entities must be handled especially carefully. No memset(), memcpy(), etc. calls to set the data, no calls to malloc() or calloc() to create these types dynamically, etc. You can only construct these types using classical C++ methods, i.e. construction, using operator new to create them dynamically, using the "=" to assign, and of course, you use the public interface if these types are structs or classes. That's it -- anything else is undefined behaviour.

The CString was a classic example as I stated previously -- there is an internal reference count that the CString class maintains, and wiping out this information with memset() will cause the CString object to not behave properly. Another classic mistake is creating these types with malloc() --, if you were to create a CString with "malloc", you will see the CString is bogus, because it isn't really created. It's just a blob of bytes that does nothing -- as soon as you try to call any member function of CString, you will probably crash.

One thing, I wasn't talking about making a copy of IniPath, just a copy of one of the CStrings in it that was already set with data.

For example,

[FONT=monospace]m_CfgData.PinNamePath is set with data when the INI is read. Then later when that path is used, I do this:

CString filename = pFrame->m_CfgData.PinNamePath;

Yes, but as stated, if you've messed up the reference counting aspect of CString, your program might as well crash (unfortunately, these bugs usually don't show up, and you have a program that seems to work but is really faulty).

When the m_CfgData structure is filled within the MainFrame, the above setting of filename results in a NULL string, even though I can look at the string in m_CfgData.PinNamePath and it looks correct.

Well, there is no such thing as a NULL CString. A CString can be empty, but never NULL. Also, it could be that your debugger is not showing you the info you need to see, and it isn't the CString itself. The debugger has its own set of rules on how to display various types. So you need to determine if the CString has actually gone off the deep end, or if it's just the way the debugger is showing you the data.

Regards,

Paul McKenzie

Thread: Reading ASCII Strings into a Unicode Program

Thread Tools

Display

Reading ASCII Strings into a Unicode Program

Re: Reading ASCII Strings into a Unicode Program

Re: Reading ASCII Strings into a Unicode Program

Re: Reading ASCII Strings into a Unicode Program

Re: Reading ASCII Strings into a Unicode Program

Re: Reading ASCII Strings into a Unicode Program

Re: Reading ASCII Strings into a Unicode Program

Re: Reading ASCII Strings into a Unicode Program

Re: Reading ASCII Strings into a Unicode Program

Re: Reading ASCII Strings into a Unicode Program

Re: Reading ASCII Strings into a Unicode Program

Re: Reading ASCII Strings into a Unicode Program

Re: Reading ASCII Strings into a Unicode Program

Re: Reading ASCII Strings into a Unicode Program

Re: Reading ASCII Strings into a Unicode Program

Posting Permissions