CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 2 of 2 FirstFirst 12
Results 16 to 25 of 25
  1. #16
    VictorN's Avatar
    VictorN is offline Super Moderator Power Poster
    Join Date
    Jan 2003
    Location
    Hanover Germany
    Posts
    20,396

    Re: ConvertStringToBSTR lost char

    Which exactly IDE (VC++ version, service packs installed, Platform SDK if any?) are you using?
    Example of My IDE (no problems with your code!):
    VC++6.0, SP5 and SP6, Platform SDK from Feb. 2003
    Victor Nijegorodov

  2. #17
    Join Date
    Nov 2003
    Posts
    1,902

    Re: ConvertStringToBSTR lost char

    >> I think that ConvertStringToBSTR and then ConvertBSTRToString should has no change unconditionally.
    Wrong. You are passing a UTF8 encoded string to a function that does not take a UTF8 string. That's called undefined behavior. How you can possibly expect undefined behavior to do the same thing on two different computers?

    The point of my code was to show that the correct usage of MultiByteToWideChar() does not give the same results as ConvertStringToBSTR() - showing ConvertStringToBSTR() to be incorrect in the first place.

    >> I did not use MultiByteToWideChar.
    MultiByteToWideChar() is the only correct call to use because it's the only call in which you can say, "hey, this parameter is UTF8 encoded".

    You might as well be asking, "Why doesn't printf(0) do the same thing on all my computers?"

    gg
    Last edited by Codeplug; May 26th, 2008 at 08:03 AM.

  3. #18
    Join Date
    Mar 2002
    Posts
    290

    Re: ConvertStringToBSTR lost char

    Dear Codeplug:

    I think that I should not post the first code which use WideCharToMultiByte.
    It misdirect you.

    ConvertStringToBSTR and ConvertBSTRToString is used for handling conversions between BSTR and char*. Do you mean that it use UTF8 internally?

    OK. I'm writting a COM which like
    STDMETHOD(get_GetUTF8String)(int i_iLanguageCode, BSTR i_bstrSourceString, /*[out, retval]*/ BSTR *pVal);

    Please tell me how to return char * by BSTR *pVal

    thank you.

  4. #19
    Join Date
    Mar 2002
    Posts
    290

    Re: ConvertStringToBSTR lost char

    Dear VictorN:

    I have tried :
    1.February 2003 SDK & VC++ enterprise 6.0 SP6 & windows 2000 advanced server SP4
    2.February 2003 SDK & VC++ enterprise 6.0 & windows 2000 advanced server SP4
    3. VC++ enterprise 6.0 + windows XP SP2

    thank you.

  5. #20
    Join Date
    Feb 2005
    Posts
    2,160

    Re: ConvertStringToBSTR lost char

    I think the confusion here might be with what a BSTR is physically in memory. Unlike a normal C/C++ character array (or WORD array in unicode) with a null terminator, a BSTR has prepended byte(s) that tell functions that take BSTR arguments how long the string is. There is some voodoo with the memory management (I think) in that a BSTR object pointer actually points to the memory of the first character in the string rather than the first byte of the allocated memory. Functions that take BSTR args compensate for this by shifting everything up to the first actual displayable character. If you pass a BSTR to a function that is supposed to take a wchar_t or char type string, this shifting messes everything up.

  6. #21
    VictorN's Avatar
    VictorN is offline Super Moderator Power Poster
    Join Date
    Jan 2003
    Location
    Hanover Germany
    Posts
    20,396

    Re: ConvertStringToBSTR lost char

    Dear hoxsiew!
    We all here know this BSTR theory. And I see here no problem at all!
    I did the OP's code test (from the post#) on my two PCs (well, both work with XP SP2, i.e. XP Professional, Version 2002, Service Pack 2), the code was compiled/linked in VC++6.0 SP6. The compiler options:
    WIN32,_DEBUG,_CONSOLE,_MBCS

    The result: this code works 100% correct on my both PCs.

    And how could you explain it?

    PS: well the only difference I see is:
    the result expected by OP:
    E6 9D 8E E5 85 8B E5 8B A4 00 00
    The result I get:
    E6 9D 8E E5 85 8B E5 8B A4 00
    but it is also OK since the second NULL at the end of string is ignored by default.
    Victor Nijegorodov

  7. #22
    Join Date
    Nov 2003
    Posts
    1,902

    Re: ConvertStringToBSTR lost char

    >> It misdirect you.
    No, I'm following everything ok.

    >> Do you mean that it use UTF8 internally?
    No. Let me try to state it a little more clearly: ConvertStringToBSTR and ConvertBSTRToString DO NOT WORK with UTF8 encoded strings. They do not convert UTF8 strings, they do not create UTF8 strings. They only convert ASCII char* (no encoding) to/from UTF16(LE). If you try to use these functions for anything else, the result is undefined.

    >> STDMETHOD(get_GetUTF8String)(int i_iLanguageCode, BSTR i_bstrSourceString, /*[out, retval]*/ BSTR *pVal);
    First, I want make clear that BSTR is just a COM wchar_t string. As I mentioned before, Windows assumes UTF16(LE) encoding for Unicode strings (wchar_t or BSTR). UTF8 is a Unicode encoding designed to be used in a char* string. So using a BSTR to return UTF8 doesn't really make sense. A SafeArray of VT_I1 makes better sense.

    Is "i_bstrSourceString" a Windows-Unicode, UTF16(LE), string?
    Why are you passing in "i_iLanguageCode"? What is that?

    If the source string is Unicode, then the language doesn't matter.

    >> Please tell me how to return char * by BSTR *pVal
    I don't see why you would want to do this. A SafeArray of VT_I1 would be a better COM-representation of a char* string (where COM handles the memory management).

    >> I did the OP's code test (from the post#)...The result: this code works 100% correct on my both PCs.
    Well, the post # doesn't matter because sm_ch has not posted any code which is correct. There is no point in running incorrect code on multiple PC's and discussing why it behaves differently (or the same) - the behavior is undefined. Just like the behavior of "printf(0)" is undefined. Who cares what it does on different PC's. You just don't do it.

    Here is how you convert to/from UTF8 and UTF16(LE):
    Code:
    #include <windows.h>
    #include <comdef.h>
    #include <comutil.h>
    
    #include <iostream>
    #include <iomanip>
    #include <vector>
    using namespace std;
    
    //-----------------------------------------------------------------------------
    
    void print_hexchars(const char *sz)
    {
        for (; *sz; ++sz)
            cout << hex << setw(2) << int((unsigned char)*sz) << ' ';
    }//print_hexchars
    
    void print_hexchars(const wchar_t *sz)
    {
        for (; *sz; ++sz)
            cout << hex << setw(4) << int(*sz) << ' ';
    }//print_hexchars
    
    //-----------------------------------------------------------------------------
    
    bool WinUnicodeToUTF8(const wchar_t *src, vector<char> &utf8)
    {
        utf8.clear();
    
        // get the required length
        int len = WideCharToMultiByte(CP_UTF8, 0, src, -1, 0, 0, 0, 0);
        if (!len)
            return false;
    
        utf8.resize(len);
        len = WideCharToMultiByte(CP_UTF8, 0, src, -1, &utf8[0], len, 0, 0);
        return len != 0;
    }//WinUnicodeToUTF8
    
    //-----------------------------------------------------------------------------
    
    bool UTF8ToWinUnicode(const char *src, vector<wchar_t> &utf16le)
    {
        utf16le.clear();
    
        // get the required length
        int len = MultiByteToWideChar(CP_UTF8, 0, src, -1, 0, 0);
        if (!len)
            return false;
    
        utf16le.resize(len);
        len = MultiByteToWideChar(CP_UTF8, 0, src, -1, &utf16le[0], len);
        return len != 0;
    }//UTF8ToWinUnicode
    
    //-----------------------------------------------------------------------------
    
    int main()
    {
    #if _MSC_VER >= 1400
        const wchar_t *pUTF16le = L"李克勤";
        // Warning: Do not load and save within a non-Unicode editor
        //          For maximum editor compatibility, use this UCS string instead:
        //             L"\u674e\u514b\u52e4"
    #else
        // no support for UCS or Unicode source code :(
        // use hex characters for compatibility with everything prior to VS 2005
        const wchar_t *pUTF16le = L"\x674e\x514b\x52e4";
    #endif
        
        // Known UTF8 encoding of pUTF16le, for testing
        const char *pUTF8_Known = "\xE6\x9D\x8E\xE5\x85\x8B\xE5\x8B\xA4";
    
        vector<char> vUTF8;
        if (!WinUnicodeToUTF8(pUTF16le, vUTF8))
        {
            DWORD le = GetLastError();
            cout << "WinUnicodeToUTF8() failed, le = " << le << endl;
            return 1;
        }//if
    
        const char *pvUTF8 = &vUTF8[0];
        cout << "UTF16(LE) -> UTF8" << endl;
        print_hexchars(pUTF16le);
        cout << " -> ";
        print_hexchars(pvUTF8);
        cout << endl;
    
        // CRT's strcmp() does a simple byte comparison
        if (strcmp(pvUTF8, pUTF8_Known) != 0)
        {
            cout << "FAILED - pUTF8 != pUTF8_Known" << endl;
            return 1;
        }//if
            
        cout << "It works!" << endl << endl;
    
        // now convert pUTF8_Known to UTF16(LE), should match our pUTF16le
        vector<wchar_t> vUTF16;
        if (!UTF8ToWinUnicode(pUTF8_Known, vUTF16))
        {
            DWORD le = GetLastError();
            cout << "UTF8ToWinUnicode() failed, le = " << le << endl;
            return 1;
        }//if
    
        const wchar_t *pvUTF16 = &vUTF16[0];
        cout << "UTF8 -> UTF16(LE)" << endl;
        print_hexchars(pUTF8_Known);
        cout << " -> ";
        print_hexchars(pvUTF16);
        cout << endl;
    
        if (wcscmp(pvUTF16, pUTF16le) != 0)
        {
            cout << "FAILED - pvUTF16 != pUTF16le" << endl;
            return 1;
        }//if
        
        cout << "It works!" << endl;
    
        return 0;
    }//main
    You now have (correct) code for coverting UTF16(LE) to/from UTF8. Now you just need to fix your get_GetUTF8String() interface to do the right thing.

    gg
    Last edited by Codeplug; May 27th, 2008 at 03:41 PM.

  8. #23
    VictorN's Avatar
    VictorN is offline Super Moderator Power Poster
    Join Date
    Jan 2003
    Location
    Hanover Germany
    Posts
    20,396

    Re: ConvertStringToBSTR lost char

    Quote Originally Posted by Codeplug
    >> [b]>> I did the OP's code test (from the post#)...The result: this code works 100% correct on my both PCs.
    Well, the post # doesn't matter because sm_ch has not posted any code which is correct. There is no point in running incorrect code on multiple PC's and discussing why it behaves differently (or the same) - the behavior is undefined. Just like the behavior of "printf(0)" is undefined. Who cares what it does on different PC's. You just don't do it.
    1. Why do you think "sm_ch has not posted any code which is correct"?
    What do you see wrong in this code:
    Code:
    #include <comdef.h>         // MFC core and standard components
    
    #include <comutil.h>
    
    #pragma comment(lib, "comsupp.lib")
    
    void main()
    {
    	char szTemp[10000];
    	memset(szTemp, 0, sizeof(szTemp));
    	strcpy(szTemp, "\xE6\x9D\x8E\xE5\x85\x8B\xE5\x8B\xA4\x00\x00");//store E6 9D 8E E5 85 8B E5 8B A4 00 00
    	
    	char szTemp1[10000];
    
    	_bstr_t bstr1 = _com_util::ConvertStringToBSTR(szTemp);
    	strcpy(szTemp1, _com_util::ConvertBSTRToString(bstr1));
    }
    Where in this code snippet do you see any mention of UTF8 and/or UTF16 conversion?

    Here is how you convert to/from UTF8 and UTF16(LE):
    ....
    You now have (correct) code for converting UTF16(LE) to/from UTF8. Now you just need to fix your get_GetUTF8String() interface to do the right thing.

    gg
    Well I respect your work to develop all this conversion functions, but again:
    Why the conversion char -> _bstr_t -> char* using ConvertStringToBSTR/ConvertBSTRToString does not work by OP?
    Note that also would mean that such methods as
    _bstr_t::char*
    _bstr_t( const char* s2 )
    ...
    _variant_t( const char* strSrc )
    _variant_t::SetString
    ...

    won't work!

    What it all depends on?
    On the On the code page used internally by conversion?
    On the character set (for example, only for 7-bits characters)?
    On something else?
    Victor Nijegorodov

  9. #24
    Join Date
    Nov 2003
    Posts
    1,902

    Re: ConvertStringToBSTR lost char

    >> Where in this code snippet do you see any mention of UTF8 and/or UTF16 conversion?
    No mention needed. When you see a char string with characters > 0x7F, then you know you don't have a typical ASCII string. By post #3, we see the use CP_UTF8, and sm_ch's intention/confusion starts to become a little more clear.

    >> "\xE6\x9D\x8E\xE5\x85\x8B\xE5\x8B\xA4\"
    This is the UTF8 encoding of the Unicode characters "李克勤". I probably should have mentioned this in the beginning

    >> What it all depends on?
    ConvertStringToBSTR/ConvertBSTRToString will work as expected, as long as your parameters are a non-encoded ACSII string, or a UTF16(LE) string. Anything else doesn't work, and shouldn't be expected to work. The documentation isn't very clear on this. But if the code samples don't make you a believer, here's the x86 code to ConvertStringToBSTR(), which was disassembled from comsuppw.lib:
    Code:
      // first call to MultiByteToWideChar to get length
      0000004C: push        0
      0000004E: push        0
      00000050: push        edi
      00000051: push        esi
      00000052: push        0
      00000054: push        0
      00000056: call        dword ptr [__imp__MultiByteToWideChar@24]
    ...
      // second call to MultiByteToWideChar for conversion
      00000107: push        esi
      00000108: push        ebx
      00000109: push        edi
      0000010A: mov         ecx,dword ptr [ebp+8]
      0000010D: push        ecx
      0000010E: push        0
      00000110: push        0
      00000112: call        dword ptr [__imp__MultiByteToWideChar@24]
    As you can see, the first parameter is always CP_ACP. No good for UTF8 work.

    gg

  10. #25
    Join Date
    Mar 2002
    Posts
    290

    Re: ConvertStringToBSTR lost char

    I guess that it is because the last char is \x00. may be the number of not null string in memory must be even, if not it is discarded.

    thank you very much.

    My purpose is to transfer the binary data by com interface. I have chosen BSTR but it is wrong.

Page 2 of 2 FirstFirst 12

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured