CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 3 of 3

Thread: UTF-8 questions

  1. #1
    Join Date
    Feb 2007
    Posts
    16

    UTF-8 questions

    Hi!

    I am having problems creating an app which understands UTF-8.

    I have defined with #define UNICODE/_UNICODE in the beginning.
    I have changed from char to TCHAR and so on.
    I am starting with wWinMain instead of WinMain.
    I am using #pragma code_page(65001) for menues etc in different languages.

    I am building with Borland

    If anyone is interested...
    bcc32.exe -tW -WU -DUNICODE -D_UNICODE %1.cpp
    brcc32.exe -w32 %1.rc
    ilink32.exe -aa c0w32w %1.obj,%1,,import32 cw32i,,%1.res
    %1.exe

    I have check with IsWindowUnicode() and all the windows are UNICODE.

    What's my problem?

    I can read from saved text files in UTF-8 format into a editfield (EM_STREAMIN) with no problem. But if I write something in an editfield and want to get that with a WM_GETTEXT message and set it in another field with WM_SETTEXT it doesn't work. It seems like it doesn't understand that the text is UTF-8, and treats it like ANSI. What am I missing?

    Really any help is appreciated.

    Thomas

  2. #2
    Join Date
    Nov 2003
    Posts
    1,902

    Re: UTF-8 questions

    >> I have defined with #define UNICODE/_UNICODE in the beginning.
    Good. This will use wchar_t strings (UTF16LE encoded) in Win32 API's that deal with strings.

    >> I have changed from char to TCHAR and so on
    Forget TCHAR's. It's only useful if you need to support both an MBCS codepage and Unicode (UTF16LE) at the same time. If you aren't supporting legacy MBCS code, then just use wchar_t for UTF16LE strings, and char for UTF8 strings.

    >> I am using #pragma code_page(65001) for menues etc in different languages.
    CP_UTF8 (65001) isn't a real codepage. It's just an identifier that was added so that MultiByteToWideChar() could support UTF8 to UTF16 conversions. The typical solution is to save you rc file as UTF16LE and remove any code_page pragma's.

    Having said that, this technique seems to work (at least with version 6.1.6723.1 of the MS resource compiler, using a simple string table).

    >> It [the edit field] seems like it doesn't understand that the text is UTF-8, and treats it like ANSI.
    It isn't UTF8. Windows controls don't deal in UTF8. It's either Unicode (UTF16LE) or some codepage encoded ANSI text.

    gg

  3. #3
    Join Date
    Feb 2007
    Posts
    16

    Re: UTF-8 questions

    Thanks a lot for the help.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured