UTF8 Conversion problem
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 1 of 2 12 LastLast
Results 1 to 15 of 21

Thread: UTF8 Conversion problem

  1. #1
    Join Date
    Jun 2012
    Posts
    10

    UTF8 Conversion problem

    Hi,

    I am retrieving a string from the MySQL database as UTF8.
    Now ofcourse i checked if the string itself is ok by writing the retrieved string to a file and opened it in notepad++ under UTF8 format and it reads it correctly.
    After using this code:

    PHP Code:
    res stmt->executeQuery("SELECT * FROM `members` WHERE `member_id` = 50");
    while (
    res->next())
    {
        
    WCHAR wText[1000];
        
    MultiByteToWideChar(CP_UTF80res->getString("title").c_str(), -1wText1000);
        
    WriteToFile("Test.txt"wText);
    }
    // res->getString("title").c_str() = UTF8 String which im talking about 
    I open the file in notepad++ and i see gibberish characters, i tried opening it under every existing encoding formats and all show gibberish.
    Am i doing something in the conversion API?
    I am trying to convert a UTF8 string to a UNICODE string (That UNICODE string that is supported by wprintf %s).

    Any help would be greatly appreciated.
    Thank you.

  2. #2
    Join Date
    Feb 2005
    Location
    Madrid (Spain)
    Posts
    511

    Re: UTF8 Conversion problem

    Hi.

    If you stop the debugger in "WriteToFile("Test.txt", wText);": is correct the value of wText?

  3. #3
    VictorN's Avatar
    VictorN is online now Super Moderator Power Poster
    Join Date
    Jan 2003
    Location
    Wallisellen (ZH), Switzerland
    Posts
    17,418

    Re: UTF8 Conversion problem

    What is res?
    What is getString?
    What does MultiByteToWideChar return?
    What is WriteToFile?
    Victor Nijegorodov

  4. #4
    Join Date
    Nov 2003
    Posts
    1,800

    Re: UTF8 Conversion problem

    >> I am trying to convert a UTF8 string to a UNICODE string (That UNICODE string that is supported by wprintf %s).
    Here is code for that: http://forums.codeguru.com/showthrea...31#post1790131

    gg

  5. #5
    Join Date
    Nov 2000
    Location
    Voronezh, Russia
    Posts
    5,990

    Re: UTF8 Conversion problem

    99% the garbage is because of WriteToFile. Remember, you write unicode text to file. So, is the function unicode aware? Did you manage to start the file with proper BOM?
    Best regards,
    Igor

  6. #6
    Join Date
    Jun 2012
    Posts
    10

    Re: UTF8 Conversion problem

    res and GetString() are MySQL objects, they are used to retreive the UTF8 string and they works as i mentioned in my post, i saved the UTF8 string using WriteToFile() and then opened in notepad++ under UTF8 encoding, works perfect.

    WriteToFile() & WriteToFileW():
    PHP Code:
    inline void WriteToFile(charpFileNamechartext)
    {
        
    GetModuleFileNameA(GetModuleHandleA(NULL), dlldir512);
        for(
    int i strlen(dlldir); 0i--) { if(dlldir[i] == '\\') { dlldir[i+1] = 0; break; } }

        
    HANDLE hFile CreateFileA(GetDirectoryFile(pFileName), GENERIC_WRITEFILE_SHARE_READ FILE_SHARE_WRITENULLOPEN_ALWAYSFILE_ATTRIBUTE_NORMALNULL);
        if (
    hFile)
        {
            
    SetFilePointer(hFile00FILE_END);

            
    DWORD szOut;
            
    WriteFile(hFile, (void*)textstrlen(text), &szOutNULL);
            
    CloseHandle(hFile);
        }
    }

    inline void WriteToFileW(charpFileNamewchar_ttext)
    {
        
    GetModuleFileNameA(GetModuleHandleA(NULL), dlldir512);
        for(
    int i strlen(dlldir); 0i--) { if(dlldir[i] == '\\') { dlldir[i+1] = 0; break; } }

        
    HANDLE hFile CreateFileA(GetDirectoryFile(pFileName), GENERIC_WRITEFILE_SHARE_READ FILE_SHARE_WRITENULLOPEN_ALWAYSFILE_ATTRIBUTE_NORMALNULL);
        if (
    hFile)
        {
            
    SetFilePointer(hFile00FILE_END);

            
    DWORD szOut;
            
    WriteFile(hFile, (void*)textwcslen(text)*2, &szOutNULL);
            
    CloseHandle(hFile);
        }

    I dont see why they would not work.

    Updated code:
    PHP Code:
    res stmt->executeQuery("SELECT * FROM `members` WHERE `member_id` = 50");
    while (
    res->next())
    {
        
    WCHAR wText[1000];
        
    MultiByteToWideChar(CP_UTF80res->getString("title").c_str(), -1wText1000);
        
    WriteToFileW("Test.txt"wText);

    UTF8 String: สุดสุดสุดสุดสุด
    Output in Test1.txt: *8*8*8*8*8
    File Size: 30 bytes

    There is really something wrong somewhere i cant find what.

    Thank you for your help.

  7. #7
    VictorN's Avatar
    VictorN is online now Super Moderator Power Poster
    Join Date
    Jan 2003
    Location
    Wallisellen (ZH), Switzerland
    Posts
    17,418

    Re: UTF8 Conversion problem

    Quote Originally Posted by well90 View Post
    res and GetString() are MySQL objects
    And "MySQL objects" return std::string?
    Or what is
    Code:
    res->getString("title").c_str()
    And please, don't use PHP tags for C++ code, use Code tags (#) instead.
    Victor Nijegorodov

  8. #8
    Join Date
    Apr 1999
    Posts
    27,430

    Re: UTF8 Conversion problem

    Quote Originally Posted by well90 View Post
    res and GetString() are MySQL objects,
    Those are objects that do things in your code we still don't know enough about, for example c_str(). What is returned from that function?

    To make this simple, replace the third parameter of that call to MultiByteToWideChar with a hard-coded string instead of those calls to functions we have little knowledge of. If you did that, does that hard-coded string convert correctly?

    If it does output correctly, then the root of the problem are those getString() and c_str() calls you're making. The focus is then moved away from the Windows calls and is now moved to your MySQL functions calls. If that string doesn't output correctly, then we can forget about all of that MySQL stuff and concentrate solely on getting a seemingly very simple string conversion to work correctly.

    Regards,

    Paul McKenzie
    Last edited by Paul McKenzie; June 25th, 2012 at 04:18 PM.

  9. #9
    Join Date
    Jun 2012
    Posts
    10

    Re: UTF8 Conversion problem

    Quote Originally Posted by Paul McKenzie View Post
    Those are objects that do things in your code we still don't know enough about, for example c_str(). What is returned from that function?

    To make this simple, replace the third parameter of that call to MultiByteToWideChar with a hard-coded string instead of those calls to functions we have little knowledge of. If you did that, does that hard-coded string convert correctly?

    If it does output correctly, then the root of the problem are those getString() and c_str() calls you're making. The focus is then moved away from the Windows calls and is now moved to your MySQL functions calls. If that string doesn't output correctly, then we can forget about all of that MySQL stuff and concentrate solely on getting a seemingly very simple string conversion to work correctly.

    Regards,

    Paul McKenzie
    I cant write thai characters in a simple char[] object, the compiler replace those chars with questions marks.
    This code:

    Code:
    char* szTest1 = "สุดสุดสุดสุดสุด";
    CLogger::WriteToFile("Test1.txt", szTest1);
    Output in Text1.txt: ???????????????
    And the compiler complains about:
    Warning 18 warning C4566: character represented by universal-character-name '\u0E14' cannot be represented in the current code page (1252) E:\VS\MySQL Client Test\MySQL Client Test\MySQL Client Test.cpp 52 1 MySQL Client Test


    Thats why i had to paste the MySQL way.

    Now about your doubts of the MySQL functions, as ive mentioned in my previous post, i tested the getString().c_str() function by writing its returned value directly into a file... and the file have the correct thai string... therefor all mysql functions works fine.

    res->getString("title") returns a std::string object.

  10. #10
    Join Date
    Jun 2012
    Posts
    10

    Re: UTF8 Conversion problem

    Ok, I have better example so you can stop making doubts on MySQL.

    Code:
    char* szTest1 = "t\xE9st"; // tÚst
    WriteToFile("Test1.txt", szTest1);
    I open Test1.txt in notepad and I DO SEE "tÚst".

    Now when i test this:
    Code:
    WCHAR wText[1000];
    char* szTest1 = "t\xE9st"; // tÚst
    MultiByteToWideChar(CP_UTF8, 0, szTest1, -1, wText, 1000);
    WriteToFileW("Test1.txt", wText);
    I open Test1.txt in notepad and i see "t�st".

    Now its 100% not MySQL...
    Anything that i convert get into gibberish... atleast UTF8 characters.

  11. #11
    Join Date
    Nov 2003
    Posts
    1,800

    Re: UTF8 Conversion problem

    >> I open Test1.txt in notepad and i see "t�st".
    The UTF8 encoding for small letter E with acute is "\xC3\xA9". You should be checking for error codes from functions that return them.
    Next you need to add a UTF16-LE BOM at the front of your log file - since that is what you're writing. Or you can tell notepad++ that the encoding is "UCS-2 Little Endian".

    gg

  12. #12
    Join Date
    Jun 2012
    Posts
    10

    Re: UTF8 Conversion problem

    After telling Notepad++ that the encoding is "UCS-2 Little Endian", i see this "t�st", except the � is a cube.

    Another test:

    Code:
    wprintf(L"Member Name: %ls \n", wText); // Prints 't?st'
    Code:
    wprintf(L"Member Name: %s \n", wText); // Prints 't?st'
    WTH is going on...

  13. #13
    Join Date
    Nov 2003
    Posts
    1,800

    Re: UTF8 Conversion problem

    >> The UTF8 encoding for small letter E with acute is "\xC3\xA9".
    Code:
    const char* szTest1 = "t\xC3\xA9st"; // tÚst
    >> Another test:
    http://cboard.cprogramming.com/cplus...ml#post1086757

    gg

  14. #14
    Join Date
    Apr 1999
    Posts
    27,430

    Re: UTF8 Conversion problem

    Quote Originally Posted by well90 View Post
    I cant write thai characters in a simple char[] object,
    Yes you can if you know the escape codes. See how CodePlug writes string literals using the hex escape codes.

    Regards,

    Paul McKenzie
    Last edited by Paul McKenzie; June 25th, 2012 at 07:40 PM.

  15. #15
    Join Date
    Jun 2012
    Posts
    10

    Re: UTF8 Conversion problem

    Quote Originally Posted by Codeplug View Post
    >> The UTF8 encoding for small letter E with acute is "\xC3\xA9".
    Code:
    const char* szTest1 = "t\xC3\xA9st"; // tÚst
    >> Another test:
    http://cboard.cprogramming.com/cplus...ml#post1086757

    gg
    Weird, why this site encode that 'Ú' with 0xE9 ?
    http://people.physics.anu.edu.au/~mx...pt/jsUTF8.html

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center