Re: UTF8 Conversion problem
Hi.
If you stop the debugger in "WriteToFile("Test.txt", wText);": is correct the value of wText?
Re: UTF8 Conversion problem
What is res?
What is getString?
What does MultiByteToWideChar return?
What is WriteToFile?
Re: UTF8 Conversion problem
>> I am trying to convert a UTF8 string to a UNICODE string (That UNICODE string that is supported by wprintf %s).
Here is code for that: http://forums.codeguru.com/showthrea...31#post1790131
gg
Re: UTF8 Conversion problem
99% the garbage is because of WriteToFile. Remember, you write unicode text to file. So, is the function unicode aware? Did you manage to start the file with proper BOM?
Re: UTF8 Conversion problem
res and GetString() are MySQL objects, they are used to retreive the UTF8 string and they works as i mentioned in my post, i saved the UTF8 string using WriteToFile() and then opened in notepad++ under UTF8 encoding, works perfect.
WriteToFile() & WriteToFileW():
PHP Code:
inline void WriteToFile(char* pFileName, char* text)
{
GetModuleFileNameA(GetModuleHandleA(NULL), dlldir, 512);
for(int i = strlen(dlldir); i > 0; i--) { if(dlldir[i] == '\\') { dlldir[i+1] = 0; break; } }
HANDLE hFile = CreateFileA(GetDirectoryFile(pFileName), GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
if (hFile)
{
SetFilePointer(hFile, 0, 0, FILE_END);
DWORD szOut;
WriteFile(hFile, (void*)text, strlen(text), &szOut, NULL);
CloseHandle(hFile);
}
}
inline void WriteToFileW(char* pFileName, wchar_t* text)
{
GetModuleFileNameA(GetModuleHandleA(NULL), dlldir, 512);
for(int i = strlen(dlldir); i > 0; i--) { if(dlldir[i] == '\\') { dlldir[i+1] = 0; break; } }
HANDLE hFile = CreateFileA(GetDirectoryFile(pFileName), GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
if (hFile)
{
SetFilePointer(hFile, 0, 0, FILE_END);
DWORD szOut;
WriteFile(hFile, (void*)text, wcslen(text)*2, &szOut, NULL);
CloseHandle(hFile);
}
}
I dont see why they would not work.
Updated code:
PHP Code:
res = stmt->executeQuery("SELECT * FROM `members` WHERE `member_id` = 50");
while (res->next())
{
WCHAR wText[1000];
MultiByteToWideChar(CP_UTF8, 0, res->getString("title").c_str(), -1, wText, 1000);
WriteToFileW("Test.txt", wText);
}
UTF8 String: สุดสุดสุดสุดสุด
Output in Test1.txt: *8*8*8*8*8
File Size: 30 bytes
There is really something wrong somewhere i cant find what.
Thank you for your help.
Re: UTF8 Conversion problem
Quote:
Originally Posted by
well90
res and GetString() are MySQL objects
And "MySQL objects" return std::string? :confused:
Or what is
Quote:
Code:
res->getString("title").c_str()
And please, don't use PHP tags for C++ code, use Code tags (#) instead.
Re: UTF8 Conversion problem
Quote:
Originally Posted by
well90
res and GetString() are MySQL objects,
Those are objects that do things in your code we still don't know enough about, for example c_str(). What is returned from that function?
To make this simple, replace the third parameter of that call to MultiByteToWideChar with a hard-coded string instead of those calls to functions we have little knowledge of. If you did that, does that hard-coded string convert correctly?
If it does output correctly, then the root of the problem are those getString() and c_str() calls you're making. The focus is then moved away from the Windows calls and is now moved to your MySQL functions calls. If that string doesn't output correctly, then we can forget about all of that MySQL stuff and concentrate solely on getting a seemingly very simple string conversion to work correctly.
Regards,
Paul McKenzie
Re: UTF8 Conversion problem
Quote:
Originally Posted by
Paul McKenzie
Those are objects that do things in your code we still don't know enough about, for example c_str(). What is returned from that function?
To make this simple, replace the third parameter of that call to MultiByteToWideChar with a hard-coded string instead of those calls to functions we have little knowledge of. If you did that, does that hard-coded string convert correctly?
If it does output correctly, then the root of the problem are those getString() and c_str() calls you're making. The focus is then moved away from the Windows calls and is now moved to your MySQL functions calls. If that string doesn't output correctly, then we can forget about all of that MySQL stuff and concentrate solely on getting a seemingly very simple string conversion to work correctly.
Regards,
Paul McKenzie
I cant write thai characters in a simple char[] object, the compiler replace those chars with questions marks.
This code:
Code:
char* szTest1 = "สุดสุดสุดสุดสุด";
CLogger::WriteToFile("Test1.txt", szTest1);
Output in Text1.txt: ???????????????
And the compiler complains about:
Warning 18 warning C4566: character represented by universal-character-name '\u0E14' cannot be represented in the current code page (1252) E:\VS\MySQL Client Test\MySQL Client Test\MySQL Client Test.cpp 52 1 MySQL Client Test
Thats why i had to paste the MySQL way.
Now about your doubts of the MySQL functions, as ive mentioned in my previous post, i tested the getString().c_str() function by writing its returned value directly into a file... and the file have the correct thai string... therefor all mysql functions works fine.
res->getString("title") returns a std::string object.
Re: UTF8 Conversion problem
Ok, I have better example so you can stop making doubts on MySQL.
Code:
char* szTest1 = "t\xE9st"; // tést
WriteToFile("Test1.txt", szTest1);
I open Test1.txt in notepad and I DO SEE "tést".
Now when i test this:
Code:
WCHAR wText[1000];
char* szTest1 = "t\xE9st"; // tést
MultiByteToWideChar(CP_UTF8, 0, szTest1, -1, wText, 1000);
WriteToFileW("Test1.txt", wText);
I open Test1.txt in notepad and i see "t�st".
Now its 100% not MySQL...
Anything that i convert get into gibberish... atleast UTF8 characters.
Re: UTF8 Conversion problem
>> I open Test1.txt in notepad and i see "t�st".
The UTF8 encoding for small letter E with acute is "\xC3\xA9". You should be checking for error codes from functions that return them.
Next you need to add a UTF16-LE BOM at the front of your log file - since that is what you're writing. Or you can tell notepad++ that the encoding is "UCS-2 Little Endian".
gg
Re: UTF8 Conversion problem
After telling Notepad++ that the encoding is "UCS-2 Little Endian", i see this "t�st", except the � is a cube.
Another test:
Code:
wprintf(L"Member Name: %ls \n", wText); // Prints 't?st'
Code:
wprintf(L"Member Name: %s \n", wText); // Prints 't?st'
WTH is going on...
Re: UTF8 Conversion problem
>> The UTF8 encoding for small letter E with acute is "\xC3\xA9".
Code:
const char* szTest1 = "t\xC3\xA9st"; // tést
>> Another test:
http://cboard.cprogramming.com/cplus...ml#post1086757
gg
Re: UTF8 Conversion problem
Quote:
Originally Posted by
well90
I cant write thai characters in a simple char[] object,
Yes you can if you know the escape codes. See how CodePlug writes string literals using the hex escape codes.
Regards,
Paul McKenzie
Re: UTF8 Conversion problem
Quote:
Originally Posted by
Codeplug
Weird, why this site encode that 'é' with 0xE9 ?
http://people.physics.anu.edu.au/~mx...pt/jsUTF8.html
Re: UTF8 Conversion problem
Re: UTF8 Conversion problem
With your 'szText1', i encoded in unicode and wrote to file and i can read the text successfully.
Now i test with thai text and still get the same bad output.
Code:
WCHAR wText[1000];
char* szTest1 = "\xE0\xB8\xAA\xE0\xB8\xB8\xE0\xB8\x94"; // "สุด" (3 characters)
MultiByteToWideChar(CP_UTF8, 0, szTest1, -1, wText, 1000);
CLogger::WriteToFileW("Test1.txt", wText);
Original Text: "สุด" (3 characters which forms 2 symbols)
Output in Test1.txt: "*8"
Test1.txt file size: 6 Bytes
Please note that WriteToFileW write the string without the null bytes terminator... if youre wondering about the file size.
I still dont understand why it doesnt work.
Re: UTF8 Conversion problem
It just doesn't know how to render the characters. If you open the file in a hex editor, you will see "2A0E 380E 140E" for those 3 characters - which is the little endian encoding for those 3 characters.
http://www.fileformat.info/info/unic...0e2a/index.htm
http://www.fileformat.info/info/unic...0e38/index.htm
http://www.fileformat.info/info/unic...0e14/index.htm
gg
Re: UTF8 Conversion problem
Because you saved your file in Unicode format already.
Mine is in UTF8.
UTF8 format takes from 1 to 3 bytes and in my case, all of my characters takes 3 bytes.
After the conversion they should all take 2 byte.. like in your case.
Which get me back to my start point.
How do i convert this UTF-8(1-3 bytes based) string into a Unicode(2 bytes based) string?
Re: UTF8 Conversion problem
If you look in your web links that you gave me, you will see that the first character in UTF8 is 0xE0 0xB8 0xAA, just like i hardcoded in my code, and the UTF16 is 0E2A, just like you read from your file.
How do i convert this UTF8 char into a UTF16?
Re: UTF8 Conversion problem
Code:
#include <windows.h>
#include <stdio.h>
int main()
{
WCHAR wText[1000];
const char* szTest1 = "\xE0\xB8\xAA\xE0\xB8\xB8\xE0\xB8\x94";
int res = MultiByteToWideChar(CP_UTF8, 0, szTest1, -1, wText, 1000);
if (!res)
{
printf("MultiByteToWideChar failed, %u\n", GetLastError());
return 1;
}
FILE *f = fopen("WriteToFileW.txt", "w");
if (!f)
{
printf("Failed to open file\n");
return 1;
}
const char* BOM = "\xFF\xFE"; // UTF16-LE
fwrite(BOM, 1, 2, f);
fwrite(wText, 1, wcslen(wText) * 2, f);
fclose(f);
return 0;
}//main
gg