CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 8 of 8
  1. #1
    Join Date
    Aug 2006
    Posts
    230

    c++ Unicode escape sequence and stl visual studo unable to read strings

    i have line of string in text file , i opened it in notepad and saved it to Unicode Encoding
    this string contain Unicode escape sequences .
    now my visual studio project that suppose to read this file configured as :
    Use Unicode Character Set.
    in my code im using simple code:
    Code:
        ifstream fp_in;   
        fp_in.open("text1_uni.txt",std::ios_base::in);    // open the streams
        string line =""	
        if(fp_in.is_open())
        {
        		 
        	while ( !fp_in.eof() )
        	{
        	   getline (fp_in,line,'\n');
                }
        }
    bug the string in the line , is corrupted only the begging of the string is shown.
    but
    when i change the file from which i read from to UTF-8 without BOM
    but the Unicode escape sequence are garbage .

  2. #2
    Arjay's Avatar
    Arjay is offline Moderator / EX MS MVP Power Poster
    Join Date
    Aug 2004
    Posts
    13,490

    Re: c++ Unicode escape sequence and stl visual studo unable to read strings

    The UNICODE character set setting for the compiler only works on things it knows about like the CString class and the types defined in tchar.h. It knows nothing of the ansi string class std::string.

    If you want to solve this problem quickly, use CString instead of std::string.

  3. #3
    Lindley is offline Elite Member Power Poster
    Join Date
    Oct 2007
    Location
    Seattle, WA
    Posts
    10,895

    Re: c++ Unicode escape sequence and stl visual studo unable to read strings

    You need to know which encoding of Unicode the file is using. UTF-8 is Unicode; UTF-16LE is Unicode; UTF-32 is also Unicode. Unicode is just a mapping from numbers to symbols, like ASCII.

    Once you read the data into your program you have the same problem: you need to know what format it's in, and treat it appropriately. When you set MSVC to "Use Unicode Character Set", you are simply telling it to prefer routines which accept wchar_t strings rather than char strings. These routines are expecting UTF-16LE.

    You can, of course, read and store the data as UTF-8 instead. This has the nice property that it is compatible with Basic ASCII, meaning that code points 0-127 are encoded the same way in both ASCII and UTF-8. But any Unicode code point over 127 may appear to be "garbage" if you attempt to print it as Extended ASCII when it's really UTF-8.

  4. #4
    Join Date
    Aug 2006
    Posts
    230

    Re: c++ Unicode escape sequence and stl visual studo unable to read strings

    Hi and thanks for the fast reply , but i found some links about my problem , but from there still i haven't found the solution
    first link is exactly my problem
    [url]http://stackoverflow.com/questions/3806215/facebook-graph-api-non-english-album-names[\url]
    and second one is the json framework im using
    [url]http://json.org/[\url]
    still how can i get the strings right ...

  5. #5
    Join Date
    Apr 1999
    Posts
    27,449

    Re: c++ Unicode escape sequence and stl visual studo unable to read strings

    Quote Originally Posted by umen View Post
    i have line of string in text file , i opened it in notepad and saved it to Unicode Encoding
    this string contain Unicode escape sequences .
    If it's Unicode, why are you using std::string? A std::string only knows and can process easily char-sized characters.

    If you know without a doubt that you will be dealing with Unicode, and for some reason you don't want to use CString, then use the proper standard classes. You should be using std::wstring. Not only that, you should be using std::wifstream, std::wofstream, etc., not single-byte streams to read/write Unicode text.

    Also, the C++ standard streams and string class knows nothing about whether you're building a Unicode app or not. If you want to use standard classes, the responsibility of choosing the proper ones depending on the build type falls on your shoulders. A std::string, std::ifstream, etc. will always be ANSI char based, regardless of the build type. Similarly, std::wstring, std::wifstream, will always be wide-character based, regardless of whether you're building Unicode or not.

    Also, don't you see that your links are not clickable?

    http://stackoverflow.com/questions/3...sh-album-names
    http://json.org/

    Regards,

    Paul McKenzie
    Last edited by Paul McKenzie; November 30th, 2011 at 11:40 AM.

  6. #6
    Join Date
    Nov 2003
    Posts
    1,902

    Re: c++ Unicode escape sequence and stl visual studo unable to read strings

    >> std::wstring, std::wifstream, will always be wide-character based
    The interface is wide-based, but both fstream and wfstream assume that files consist of a stream of bytes which are encoded according to the "LC_CTYPE" of the stream's locale.

    "codecvt" facets can be used to customize the conversion. C++11 has some nice ones: http://msdn.microsoft.com/en-us/library/ee292114.aspx

    >> first link is exactly my problem
    So you are reading a json file which contains escape sequences, like "\u05ea\u05e2", and need to un-escape them?

    If you are reading json files, then it must be encoded as UTF8. If the files are not very large, you could use this code to read them into a wstring on Windows: http://www.codeguru.com/forum/showpo...18&postcount=5

    Why not use an existing json parser to do the reading for you?

    gg

  7. #7
    Lindley is offline Elite Member Power Poster
    Join Date
    Oct 2007
    Location
    Seattle, WA
    Posts
    10,895

    Re: c++ Unicode escape sequence and stl visual studo unable to read strings

    Quote Originally Posted by Paul McKenzie View Post
    If it's Unicode, why are you using std::string? A std::string only knows and can process easily char-sized characters.
    std::string can be used with Unicode if it's encoded as UTF-8. You have to be careful not to assume that one char is one logical character, but there's no reason you can't use a std::string to move data from one place to another encoded that way.

    The only times it would make a difference are if you do some form of text processing (search, transformation, etc), on output, or when interacting with an API that expects a particular encoding. Any of those operations requires careful knowledge of the encoding anyway (it's not enough to just say wide or not wide), so it seems to me that UTF-8 is as good as anything and easier for some people to wrap their head around.

    The important thing is to be aware of the encoding, not just the type.

  8. #8
    Join Date
    Apr 1999
    Posts
    27,449

    Re: c++ Unicode escape sequence and stl visual studo unable to read strings

    Quote Originally Posted by Lindley View Post
    std::string can be used with Unicode if it's encoded as UTF-8. You have to be careful not to assume that one char is one logical character, but there's no reason you can't use a std::string to move data from one place to another encoded that way.
    Sure you are quite right.

    But my point is that if you want "CString-like" behaviour for STL strings with respect to the build type, then the proper C++ standard string type must be chosen by the programmer. Choosing "Unicode character set" in the project settings will not turn a std::string into something that will seamlessly handle 16-bit characters, unlike the CString class.

    Regards,

    Paul McKenzie

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured