Click to See Complete Forum and Search --> : UTF-8 questions


toto1919
January 11th, 2010, 07:40 AM
Hi!

I am having problems creating an app which understands UTF-8.

I have defined with #define UNICODE/_UNICODE in the beginning.
I have changed from char to TCHAR and so on.
I am starting with wWinMain instead of WinMain.
I am using #pragma code_page(65001) for menues etc in different languages.

I am building with Borland

If anyone is interested...
bcc32.exe -tW -WU -DUNICODE -D_UNICODE %1.cpp
brcc32.exe -w32 %1.rc
ilink32.exe -aa c0w32w %1.obj,%1,,import32 cw32i,,%1.res
%1.exe

I have check with IsWindowUnicode() and all the windows are UNICODE.

What's my problem?

I can read from saved text files in UTF-8 format into a editfield (EM_STREAMIN) with no problem. But if I write something in an editfield and want to get that with a WM_GETTEXT message and set it in another field with WM_SETTEXT it doesn't work. It seems like it doesn't understand that the text is UTF-8, and treats it like ANSI. What am I missing?

Really any help is appreciated.

Thomas

Codeplug
January 12th, 2010, 09:13 AM
>> I have defined with #define UNICODE/_UNICODE in the beginning.
Good. This will use wchar_t strings (UTF16LE encoded) in Win32 API's that deal with strings.

>> I have changed from char to TCHAR and so on
Forget TCHAR's. It's only useful if you need to support both an MBCS codepage and Unicode (UTF16LE) at the same time. If you aren't supporting legacy MBCS code, then just use wchar_t for UTF16LE strings, and char for UTF8 strings.

>> I am using #pragma code_page(65001) for menues etc in different languages.
CP_UTF8 (65001) isn't a real codepage. It's just an identifier that was added so that MultiByteToWideChar() could support UTF8 to UTF16 conversions. The typical solution is to save you rc file as UTF16LE and remove any code_page pragma's.

Having said that, this technique seems to work (at least with version 6.1.6723.1 of the MS resource compiler, using a simple string table).

>> It [the edit field] seems like it doesn't understand that the text is UTF-8, and treats it like ANSI.
It isn't UTF8. Windows controls don't deal in UTF8. It's either Unicode (UTF16LE) or some codepage encoded ANSI text.

gg

toto1919
January 17th, 2010, 02:05 PM
Thanks a lot for the help.