CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 12 of 12
  1. #1
    Join Date
    Apr 1999
    Posts
    123

    Code page in WideCharToMultiByte

    I am using _getmbcp() to get the current code page used in WideCharToMultiByte. By the way, it is a _UNICODE compile.

    I am trying to convert Kanji unicode to multibyte. Somewhere along the line I not getting the correct characters, but it could be elsewhere.

    Is this the correct code page for this situation?
    Last edited by Bob H; September 26th, 2002 at 11:06 AM.

  2. #2
    Join Date
    Apr 1999
    Posts
    123
    _getmbcp returns the current multibyte code page.

    Another possibly related issue:
    I thought that all uncode strings could be translated into a 2-bytes per character multibyte string using WideCharToMultiByte. But, in one of my books on unicode, I see a table which shows the number of bytes to encode UTF-8 characters and the number goes to 4. Does anyone know if Kanji uses more than 2-bytes?

  3. #3
    Join Date
    Apr 1999
    Posts
    123
    I read your reply to another unicode question on the forum which was helpful.

    I have created a test program for my contact in Japan to run with Windows 2000.

    It displays text which he enters in a CEdit box and in another CEdit box the length of the CString string holding the text is displayed. The simple test program is unicode compiled.

    By the way, the code runs great with MS Mincho (with the code page set to Japan) on my XP machine but fails when it runs on Win XP, 2000 computers in Japan.

    So, I presume if the length of text equals the length of the string, we are in the UTF-16 mode. If we are not, then I am in deep trouble. My software assumes one 2-byte TCHAR per character. The code uses macros like _istlead and _tcsinc.

    Is the UTF value set by the font or the operating system and is there a way to test for it and/or set it?

  4. #4
    Join Date
    Sep 2002
    Posts
    13
    See GetFontUnicodeRanges( ) in MSDN.
    Waqar

  5. #5
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588
    Hum...
    _getmbcp returns the current multibyte code page. A return value of 0 indicates that a single byte code page is in use.
    That is not exactly what you should use, since it might be using a single byte code page that is not Latin-1 as for english.

    Another possibility is to use the Windows API calls which work fine for me.
    Code:
    UINT LangIDToCodePage(long lLangID)
    {
     char codepage[7];
     int Res;
    
     memset(codepage, 0, 7);
     Res = GetLocaleInfo(lLangID, LOCALE_IDEFAULTANSICODEPAGE, codepage, 6);
     if (Res != 0) {
      return atoi(codepage);
     } else {
      return CP_ACP;
     }
    }
    ...
    // On startup do : // for me in OnCreate
    m_InputCP = LangIDToCodePage(LOWORD(GetKeyboardLayout(0)));
    // In your message handler do :
     case WM_INPUTLANGCHANGE :
      m_InputCP = LangIDToCodePage(LOWORD(lParam));
      bHandled = TRUE;
      break;
    // When you convert from Unicode to MuliByte, use m_InputCP as the codepage
    Last edited by Yves M; September 27th, 2002 at 09:15 AM.
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  6. #6
    Join Date
    Apr 1999
    Posts
    123
    I can't imagine that a single byte code page would be the situation since the problem is occurring on Win 2000 computers in Japan. But, I will create a test dialog which displays the value of your routine and _getmbcp().

  7. #7
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588
    True, it would not be related to your problem with japanese, but in Russian, Greek arabic, Hebrew etc things wouldn't work.

    Oh yes, by the way you will have to rewrite my LangIDToCodePage function for Unicode since you compile your app for Unicode.
    Last edited by Yves M; September 28th, 2002 at 07:39 AM.

  8. #8
    Join Date
    Apr 1999
    Posts
    123
    The code also services ANSI purposes -- English, German, etc. -- and Win 9x computers so I need to go the TCHAR/_MBCS route. There will be a separate _MBCS build for 9x computers which by the way works correctly on Japanese computers. It is the unicode version which has problems which are probably due to my mapping between text characters and text glyphs.

    I don't sufficient resources to have a different code base for this unicode Japanese application. Also I don't want to rewrite all MFC controls which use CString I believe.

    Evidence so far is that there is one TCHAR per text character.

    I have a bastardized GetGlyphIndex function which was inherited from the _MBCS world (and works for that world). I need to try the true unicode call for this function and I think my problem may be solved.

  9. #9
    Join Date
    Apr 1999
    Posts
    123
    Since the last posting I have figured out my problems and learned some things.

    First, the LangIDToCodePage code returns the same value as _getmbcp().

    Second, in my _mbcs compile I was using what, I believe, are called character codes for GetGlyphOutline. This does not work in general for a _unicode compile. When I used glyph indices (which for ascii codes < 127 differ from character coces by 29) and set the glyphindex flag in GetGlyphOutline, my problem went away. I used GetCharacterPlacement to get the glyph indices.

  10. #10
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588
    Originally posted by Bob H
    First, the LangIDToCodePage code returns the same value as _getmbcp().
    Does that mean that _getmbcp also works correctly when you switch code pages during the execution of the program ? Meaning if a japanese person needs to insert some characters in english / russian whathaveyou and switches keyboard locales while you program is running ?

  11. #11
    Join Date
    Apr 1999
    Posts
    123
    I am fairly sure that English can be entered from a Japanese keyboard. I get a lot of emails from Japan in English.

  12. #12
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588
    Well, I can also enter Japanese on my spanish or my swiss keyboards when I switch input locales

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured