CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 1 of 2 12 LastLast
Results 1 to 15 of 25
  1. #1
    Join Date
    Mar 2002
    Posts
    290

    ConvertStringToBSTR lost char

    Dear friends:
    I have some codes:

    Code:
    char szTemp[10000];//store E6 9D 8E E5 85 8B E5 8B A4 00 00
    
    memset(szTemp, 0, sizeof(szTemp));
    char szTemp1[10000];
    
    *pVal = _com_util::ConvertStringToBSTR(szTemp);
    strcpy(szTemp1, _com_util::ConvertBSTRToString(*pVal));
    //szTemp1 store E6 9D 8E E5 85 8B E5 8B 00 00 00
    so A4 is lost.

    what's the problem?

    I have tried several ways to convert and failed.

    By the way I need to transfer szTemp to VB and found this problem.So I wrote this code to test the conversion.

    thanks a lot

  2. #2
    VictorN's Avatar
    VictorN is offline Super Moderator Power Poster
    Join Date
    Jan 2003
    Location
    Hanover Germany
    Posts
    20,396

    Re: ConvertStringToBSTR lost char

    Please, post an example that can be compiled and tested.
    Victor Nijegorodov

  3. #3
    Join Date
    Mar 2002
    Posts
    290

    Re: ConvertStringToBSTR lost char

    Code:
     
    #include <comdef.h>         
    #include <comutil.h>
    
    #pragma comment(lib, "comsupp.lib")
    
    void main()
    {
    	char szTemp[10000];//store E6 9D 8E E5 85 8B E5 8B A4 00 00
    
    	_bstr_t bstr = "李克勤";
    	memset(szTemp, 0, sizeof(szTemp));
    	
    	int iOutputLength = WideCharToMultiByte(CP_UTF8,0,bstr,-1,NULL,0,NULL,NULL);
    	WideCharToMultiByte(CP_UTF8,0,bstr,-1,szTemp,iOutputLength,NULL,NULL);
    
    	char szTemp1[10000];
    
    	_bstr_t bstr1 = _com_util::ConvertStringToBSTR(szTemp);
    	strcpy(szTemp1, _com_util::ConvertBSTRToString(bstr1));
    //szTemp1 store E6 9D 8E E5 85 8B E5 8B 00 00 00
    
    }
    Thank you.

  4. #4
    VictorN's Avatar
    VictorN is offline Super Moderator Power Poster
    Join Date
    Jan 2003
    Location
    Hanover Germany
    Posts
    20,396

    Re: ConvertStringToBSTR lost char

    Well your last code snippet has nothing to do with the first one.
    And you are doing a serious mistake: the second parameter of WideCharToMultiByte
    must be LPCWSTR , not a _bstr_t.
    It should be:
    bstr.operator const wchar_t*( )

    Besides, it is still not clear how you "store E6 9D 8E E5 85 8B E5 8B A4 00 00" in the char szTemp[10000] variable after you call
    Code:
    memset(szTemp, 0, sizeof(szTemp));
    Victor Nijegorodov

  5. #5
    Join Date
    Mar 2002
    Posts
    290

    Re: ConvertStringToBSTR lost char

    Sorry for putting the comment in wrong place.

    [CODE]

    #include <comdef.h> // MFC core and standard components

    #include <comutil.h>

    #pragma comment(lib, "comsupp.lib")

    void main()
    {
    char szTemp[10000];
    _bstr_t bstr = "李克勤";
    memset(szTemp, 0, sizeof(szTemp));

    int iOutputLength = WideCharToMultiByte(CP_UTF8,0,bstr,-1,NULL,0,NULL,NULL);
    WideCharToMultiByte(CP_UTF8,0,bstr,-1,szTemp,iOutputLength,NULL,NULL);//store E6 9D 8E E5 85 8B E5 8B A4 00 00,saw this with Memory

    char szTemp1[10000];

    _bstr_t bstr1 = _com_util::ConvertStringToBSTR(szTemp);
    strcpy(szTemp1, _com_util::ConvertBSTRToString(bstr1));
    //store E6 9D 8E E5 85 8B E5 8B A4 00 00,saw this with Memory
    }
    [CODE]

  6. #6
    VictorN's Avatar
    VictorN is offline Super Moderator Power Poster
    Join Date
    Jan 2003
    Location
    Hanover Germany
    Posts
    20,396

    Re: ConvertStringToBSTR lost char

    See below in red ...
    Quote Originally Posted by sm_ch
    Sorry for putting the comment in wrong place.

    Code:
    #include <comdef.h>         // MFC core and standard components
    
    #include <comutil.h>
    
    #pragma comment(lib, "comsupp.lib")
    
    void main()
    {
    	char szTemp[10000];
    	_bstr_t bstr = "李克勤";
    //-----------        what do these magic symbols mean?
    
    	memset(szTemp, 0, sizeof(szTemp));
    	
    	int iOutputLength = WideCharToMultiByte(CP_UTF8,0,bstr,-1,NULL,0,NULL,NULL);
    	WideCharToMultiByte(CP_UTF8,0,bstr,-1,szTemp,iOutputLength,NULL,NULL);//store E6 9D 8E E5 85 8B E5 8B A4 00 00,saw this with Memory
    
    	char szTemp1[10000];
    
    	_bstr_t bstr1 = _com_util::ConvertStringToBSTR(szTemp);
    	strcpy(szTemp1, _com_util::ConvertBSTRToString(bstr1));
    //store E6 9D 8E E5 85 8B E5 8B A4 00 00,saw this with Memory
    //-------I don't see any difference! -  :confused: 
    
    }
    Victor Nijegorodov

  7. #7
    Join Date
    Mar 2002
    Posts
    290

    Re: ConvertStringToBSTR lost char

    Re: ConvertStringToBSTR lost char

    --------------------------------------------------------------------------------

    Sorry for putting the comment in wrong place.

    [CODE]

    #include <comdef.h> // MFC core and standard components

    #include <comutil.h>

    #pragma comment(lib, "comsupp.lib")

    void main()
    {
    char szTemp[10000];
    _bstr_t bstr = "李克勤";//chinese characters.
    memset(szTemp, 0, sizeof(szTemp));

    int iOutputLength = WideCharToMultiByte(CP_UTF8,0,bstr,-1,NULL,0,NULL,NULL);
    WideCharToMultiByte(CP_UTF8,0,bstr,-1,szTemp,iOutputLength,NULL,NULL);//store E6 9D 8E E5 85 8B E5 8B A4 00 00,saw this with Memory

    char szTemp1[10000];

    _bstr_t bstr1 = _com_util::ConvertStringToBSTR(szTemp);
    strcpy(szTemp1, _com_util::ConvertBSTRToString(bstr1));
    //store E6 9D 8E E5 85 8B E5 8B 00 00 00,saw this with Memory
    }
    [CODE]



    Please run this program in visual c++ 6 and then you can see the error.

  8. #8
    VictorN's Avatar
    VictorN is offline Super Moderator Power Poster
    Join Date
    Jan 2003
    Location
    Hanover Germany
    Posts
    20,396

    Re: ConvertStringToBSTR lost char

    I cannot run your program because:
    a) your "chinese characters" are displayed in my VC++6 editor as "???"
    b) my VC++6 editor is a non-unicode editor (as well as youth)
    Victor Nijegorodov

  9. #9
    Join Date
    Mar 2002
    Posts
    290

    Re: ConvertStringToBSTR lost char

    Thank you for your patience.


    Code:
    #include <comdef.h>         // MFC core and standard components
    
    #include <comutil.h>
    
    #pragma comment(lib, "comsupp.lib")
    
    void main()
    {
    	char szTemp[10000];
    	memset(szTemp, 0, sizeof(szTemp));
    	strcpy(szTemp, "\xE6\x9D\x8E\xE5\x85\x8B\xE5\x8B\xA4\x00\x00");//store E6 9D 8E E5 85 8B E5 8B A4 00 00
    	
    	char szTemp1[10000];
    
    	_bstr_t bstr1 = _com_util::ConvertStringToBSTR(szTemp);
    	strcpy(szTemp1, _com_util::ConvertBSTRToString(bstr1));
    //store E6 9D 8E E5 85 8B E5 8B 00 00 00,saw this with Memory
    }

  10. #10
    VictorN's Avatar
    VictorN is offline Super Moderator Power Poster
    Join Date
    Jan 2003
    Location
    Hanover Germany
    Posts
    20,396

    Re: ConvertStringToBSTR lost char

    Well, the last code snippet run for me pretty good and the szTemp1 contains exactly the same string as you passed to szTemp:
    E6 9D 8E E5 85 8B E5 8B A4 00
    Victor Nijegorodov

  11. #11
    Join Date
    Mar 2002
    Posts
    290

    Unhappy Re: ConvertStringToBSTR lost char

    So what is the problem?

    I'll try to re-install my Visual C++ tomorrow.






    ---------------------------------------
    God save me. I've try another PC's VC 6.0 and got the same error.
    Last edited by sm_ch; May 23rd, 2008 at 11:27 AM.

  12. #12
    VictorN's Avatar
    VictorN is offline Super Moderator Power Poster
    Join Date
    Jan 2003
    Location
    Hanover Germany
    Posts
    20,396

    Re: ConvertStringToBSTR lost char

    I don't know what your "this" and "another" PC mean, nor which version of VC6 you used.
    Mien is VC++6.0 Enterprise Edition with SP 6. It works under Win XP Sp2.
    Victor Nijegorodov

  13. #13
    Join Date
    Mar 2002
    Posts
    290

    Unhappy Re: ConvertStringToBSTR lost char

    I've tried in winxp sp2/win2k advanced server + vc6.0 and got the same error.
    the only difference is that my OSs is simplified chinese.


  14. #14
    Join Date
    Nov 2003
    Posts
    1,902

    Re: ConvertStringToBSTR lost char

    In the C++ standard, there are 3 types of "character sets" to consider:

    1) The Basic Source Character Set
    Quote Originally Posted by ISO/IEC 14882:2003(E)
    Character Sets 2.2.1
    The basic source character set consists of 96 characters: the space character, the control characters representing
    horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:
    a b c d e f g h i j k l m n o p q r s t u v w x y z
    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
    0 1 2 3 4 5 6 7 8 9
    _ { } [ ] # ( ) < > % : ; . ? * + - / ˆ & | ˜ ! = , \ " ’
    2) Physical Source File Characters
    Quote Originally Posted by ISO/IEC 14882:2003(E)
    Phases of translation 2.1.1.1
    Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set... Any source file character not in the basic source character set (2.2) is replaced by the universal-character-name that designates that character. ...
    3) The Execution Character Set
    Quote Originally Posted by ISO/IEC 14882:2003(E)
    Phases of translation 2.1.1.5
    Each source character set member, escape sequence, or universal-character-name in character literals and string literals is converted to a member of the execution character set (2.13.2, 2.13.4).

    Character Sets 2.2.3
    The basic execution character set and the basic execution wide-character set shall each contain all the members of the basic source character set, plus control characters representing alert, backspace, and carriage return, plus a null character ... The execution character set and the execution wide-character set are supersets of the basic execution character set and the basic execution wide-character set, respectively. The values of the members of the execution character sets are implementation-defined, and any additional members are locale-specific.
    So now we need to know the defined behavior of VC++ 6.0.

    2) Members of source and execution character sets
    Quote Originally Posted by MSDN
    The source character set is the set of legal characters that can appear in source files. For Microsoft C, the source character set is the standard ASCII character set.
    In summary, VC++ 6.0 source code should stick to the ASCII character set. For the "basic execution wide-character set", the MS compiler has always used a UTF16(LE) encoding of Unicode characters. If you want to enter a literal Unicode character in the source, and it can't be done using only ASCII characters, then you have to enter it using the universal-character-name construct or hex escape sequences.

    Things are better in VS 2008: http://msdn.microsoft.com/en-us/library/xwy0e8f2.aspx

    So now we are ready to tackle the issues with the following:
    >> _bstr_t bstr = "李克勤";
    Before we can say anything about this, we have to know how the file was actually saved and that depends on the editor. Let's say that the file was saved as UTF8 (with no BOM, or the compiler won't touch it). In this case, the compiler will see the string literal as the string of bytes that make up the UTF8 encoding of that Unicode string. Which is "李克勤", or "E6 9D 8E E5 85 8B E5 8B A4". _bstr_t does not expect a UTF8 encoded char string.

    Under VS 2005/8, this generates an error:
    Code:
    const char *p = "李克勤";
    
    // warning C4566: character represented by universal-character-name 
    //   '\u674E' cannot be represented in the current code page (1252)
    Here the compiler is following 2.1.1.1, "Any source file character not in the basic source character set is replaced by the universal-character-name...". The UCN character is a 16 bit value trying to fit a 8 bit slot.

    In VS 2005/8, you can now have Unicode characters in your source and see their glyphs - since source files can be saved/edited in Unicode. But you still have to keep in mind that all wide character constants and literals are encoded as UTF16-LE in the execution character set. Which means at run-time, L"李克勤" == "674E 514B 52E4".

    So here's some code that will run on both 6.0 and 2005/8
    Code:
    #include <windows.h>
    #include <comdef.h>
    #include <comutil.h>
    
    #include <iostream>
    #include <iomanip>
    using namespace std;
    
    void print_hexchars(const wchar_t *sz)
    {
        for (; *sz; ++sz)
            cout << hex << setw(2) << int(*sz) << ' ';
        cout << endl << endl;
    }//print_hexchars
    
    int main()
    {
        const char *pUTF8 = "\xE6\x9D\x8E\xE5\x85\x8B\xE5\x8B\xA4\x00";
        
        _bstr_t bstrOfUTF8 = _com_util::ConvertStringToBSTR(pUTF8);
    
        wchar_t wbuff[32];
        int res = MultiByteToWideChar(CP_UTF8, 0, pUTF8, -1, wbuff, 32);
        if (!res)
        {
            cerr << "MultiByteToWideChar failed, le = " 
                 << GetLastError() << endl;
            return 1;
        }//if
    
        cout << "UTF8 -> UTF16(LE) via ConvertStringToBSTR" << endl;
        print_hexchars(bstrOfUTF8);
        
        cout << "UTF8 -> UTF16(LE) via MultiByteToWideChar" << endl;
        print_hexchars(wbuff);
    
        return 0;
    }//main
    Output:
    Code:
    UTF8 -> UTF16(LE) via ConvertStringToBSTR
    e6 9d 17d e5 2026 2039 e5 2039 a4
    
    UTF8 -> UTF16(LE) via MultiByteToWideChar
    674e 514b 52e4
    ConvertStringToBSTR() has no idead that it needs to use CP_UTF8 when calling MultiByteToWideChar (or perhaps mbstowcs).

    gg

  15. #15
    Join Date
    Mar 2002
    Posts
    290

    Re: ConvertStringToBSTR lost char

    Dear Codeplug:

    I ran your program and find that it return:

    UTF8 -> UTF16(LE) via ConvertStringToBSTR
    93c9 5ea1 53a0 9355 //not the same as your description

    UTF8 -> UTF16(LE) via MultiByteToWideChar
    674e 514b 52e4

    and in

    Code:
    #include <comdef.h>         // MFC core and standard components
    
    #include <comutil.h>
    
    #pragma comment(lib, "comsupp.lib")
    
    void main()
    {
    	char szTemp[10000];
    	memset(szTemp, 0, sizeof(szTemp));
    	strcpy(szTemp, "\xE6\x9D\x8E\xE5\x85\x8B\xE5\x8B\xA4\x00\x00");//store E6 9D 8E E5 85 8B E5 8B A4 00 00
    	
    	char szTemp1[10000];
    
    	_bstr_t bstr1 = _com_util::ConvertStringToBSTR(szTemp);
    	strcpy(szTemp1, _com_util::ConvertBSTRToString(bstr1));
    //store E6 9D 8E E5 85 8B E5 8B 00 00 00,saw this with Memory
    }
    I did not use MultiByteToWideChar. I think that ConvertStringToBSTR and then ConvertBSTRToString should has no change unconditionally.

    Thanks a lot.

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured