CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 4 of 4
  1. #1

    Converting UTF-8 Strings to Unicode.

    Hi Guys

    I am very new to UTF8.

    I am debugging a code wherein the UTF8 string is converted into Wide char by using

    BSTR unicodestr = SysAllocStringLen(NULL, bufferLen);
    ::MultiByteToWideChar(CP_UTF8, 0, tmpBuffer, -1, unicodestr, bufferLen);

    Where tmpBuffer is
    char *tmpBuffer and has the value "ÜBERSETZEN1" //German

    However after MultiByteToWideChar is called the unicodestr has the value
    BERSETZEN1 thereby losing the first German character.

    I have written some sample code and the behavior is consistent as above.

    I am pretty sure i am missing something. Is there a way i can convert it to unicode so that i am able to retain the whole string?

    Thanks for your help.

    Kandukondein
    Last edited by kandukondein; March 15th, 2011 at 05:55 AM.
    C++ is divine.

  2. #2
    Join Date
    Aug 2008
    Location
    Scotland
    Posts
    379

    Re: Converting UTF-8 Strings to Unicode.

    Hi,

    According to MSDN, the 5th parameter for MultiByteToWideChar is LPWSTR, not BSTR.

    BSTR is used for COM, and starts with a 4byte length prefix, maybe that's where the missing character went.

    Alan

  3. #3
    Join Date
    Mar 2011
    Posts
    46

    Re: Converting UTF-8 Strings to Unicode.

    As Alan said above so if you really want it in a BSTR (http://msdn.microsoft.com/en-us/library/ms221069.aspx) you could do some fancy footwork with typecasts

    Code:
    LPWSTR P;
    DWORD* Q;
    DWORD i;
    
     BSTR unicodestr = SysAllocStringLen(NULL, bufferLen);
     P = (LPWSTR) unicodestr;      // BSTR is a pointer as is P so simply typecast them
     Q = (DWORD*) unicodestr;      // Q points to BSTR first memory as a DWORD 
    memset(P, 0, bufferlen);  // Zero all the data of BSTR
     i = MultiByteToWideChar(CP_UTF8, 0, tmpBuffer, -1, &P[2], bufferLen-4);  // &P[2] leave P[0], P[1] as index and space is -4 because we are writing past 4 byte index
     *Q = (i-1) * 2;   // Fixup the BSTR index length
    Its ugly but it should work
    Last edited by Uglybb; March 16th, 2011 at 06:47 AM.

  4. #4
    Join Date
    Nov 2003
    Posts
    1,902

    Re: Converting UTF-8 Strings to Unicode.

    BSTR's do not point to the 4-byte length that precedes the string data - it points directly to the string data. You can use a BSTR just like a "wchar_t*" string.

    >> I have written some sample code and the behavior is consistent as above.
    Let's see it.

    Code:
    #include <windows.h>
    #include <iostream>
    #include <string>
    using namespace std;
    
    int main()
    {
        const wchar_t W_U_WITH_DIAERESIS[] = L"\u00DC";
        const char UTF8_U_WITH_DIAERESIS[] = "\xC3\x9C";
        
        string str = UTF8_U_WITH_DIAERESIS;
        str += "BERSETZEN1";
    
        // get the length of the BSTR we need to allocate
        int len = MultiByteToWideChar(CP_UTF8, 0, str.c_str(), (int)str.length(), 
                                      0, 0);
        if (!len)
        {
            cerr << "MultiByteToWideChar failed, ec = " << GetLastError() << endl;
            return 1;
        }//if
    
        BSTR bstr = SysAllocStringLen(0, len);
        if (!bstr)
        {
            cerr << "SysAllocStringLen failed, ec = " << GetLastError() << endl;
            return 1;
        }//if
    
        if (!MultiByteToWideChar(CP_UTF8, 0, str.c_str(), (int)str.length(), 
                                 bstr, len))
        {
            cerr << "MultiByteToWideChar2 failed, ec = " << GetLastError() << endl;
            return 1;
        }//if
    
        // see if it worked
        wstring wstr = W_U_WITH_DIAERESIS;
        wstr += L"BERSETZEN1";
    
        if (wstr == bstr)
            cout << "Worked!" << endl;
        else
            cout << "Failed!" << endl;
    
        SysFreeString(bstr);
        return 0;
    }//main
    Works for me.

    gg

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured