Code page in WideCharToMultiByte

**Bob H** · September 26th, 2002, 11:03 AM

I am using _getmbcp() to get the current code page used in WideCharToMultiByte. By the way, it is a _UNICODE compile.

I am trying to convert Kanji unicode to multibyte. Somewhere along the line I not getting the correct characters, but it could be elsewhere.

Is this the correct code page for this situation?

**Bob H** · September 26th, 2002, 01:21 PM

_getmbcp returns the current multibyte code page.

Another possibly related issue:
I thought that all uncode strings could be translated into a 2-bytes per character multibyte string using WideCharToMultiByte. But, in one of my books on unicode, I see a table which shows the number of bytes to encode UTF-8 characters and the number goes to 4. Does anyone know if Kanji uses more than 2-bytes?

**Bob H** · September 27th, 2002, 04:14 AM

I read your reply to another unicode question on the forum which was helpful.

I have created a test program for my contact in Japan to run with Windows 2000.

It displays text which he enters in a CEdit box and in another CEdit box the length of the CString string holding the text is displayed. The simple test program is unicode compiled.

By the way, the code runs great with MS Mincho (with the code page set to Japan) on my XP machine but fails when it runs on Win XP, 2000 computers in Japan.

So, I presume if the length of text equals the length of the string, we are in the UTF-16 mode. If we are not, then I am in deep trouble. My software assumes one 2-byte TCHAR per character. The code uses macros like _istlead and _tcsinc.

Is the UTF value set by the font or the operating system and is there a way to test for it and/or set it?

**waqarahmad** · September 27th, 2002, 07:15 AM

See GetFontUnicodeRanges( ) in MSDN.

**Yves M** · September 27th, 2002, 08:44 AM

Hum...

_getmbcp returns the current multibyte code page. A return value of 0 indicates that a single byte code page is in use.

That is not exactly what you should use, since it might be using a single byte code page that is not Latin-1 as for english.

Another possibility is to use the Windows API calls which work fine for me.

Code:

UINT LangIDToCodePage(long lLangID)
{
 char codepage[7];
 int Res;

 memset(codepage, 0, 7);
 Res = GetLocaleInfo(lLangID, LOCALE_IDEFAULTANSICODEPAGE, codepage, 6);
 if (Res != 0) {
  return atoi(codepage);
 } else {
  return CP_ACP;
 }
}
...
// On startup do : // for me in OnCreate
m_InputCP = LangIDToCodePage(LOWORD(GetKeyboardLayout(0)));
// In your message handler do :
 case WM_INPUTLANGCHANGE :
  m_InputCP = LangIDToCodePage(LOWORD(lParam));
  bHandled = TRUE;
  break;
// When you convert from Unicode to MuliByte, use m_InputCP as the codepage

**Bob H** · September 28th, 2002, 06:41 AM

I can't imagine that a single byte code page would be the situation since the problem is occurring on Win 2000 computers in Japan. But, I will create a test dialog which displays the value of your routine and _getmbcp().

**Yves M** · September 28th, 2002, 07:36 AM

True, it would not be related to your problem with japanese, but in Russian, Greek arabic, Hebrew etc things wouldn't work.

Oh yes, by the way you will have to rewrite my LangIDToCodePage function for Unicode since you compile your app for Unicode.

**Bob H** · September 28th, 2002, 03:31 PM

The code also services ANSI purposes -- English, German, etc. -- and Win 9x computers so I need to go the TCHAR/_MBCS route. There will be a separate _MBCS build for 9x computers which by the way works correctly on Japanese computers. It is the unicode version which has problems which are probably due to my mapping between text characters and text glyphs.

I don't sufficient resources to have a different code base for this unicode Japanese application. Also I don't want to rewrite all MFC controls which use CString I believe.

Evidence so far is that there is one TCHAR per text character.

I have a bastardized GetGlyphIndex function which was inherited from the _MBCS world (and works for that world). I need to try the true unicode call for this function and I think my problem may be solved.

**Bob H** · October 12th, 2002, 05:22 PM

Since the last posting I have figured out my problems and learned some things.

First, the LangIDToCodePage code returns the same value as _getmbcp().

Second, in my _mbcs compile I was using what, I believe, are called character codes for GetGlyphOutline. This does not work in general for a _unicode compile. When I used glyph indices (which for ascii codes < 127 differ from character coces by 29) and set the glyphindex flag in GetGlyphOutline, my problem went away. I used GetCharacterPlacement to get the glyph indices.

**Yves M** · October 12th, 2002, 07:07 PM

Originally posted by Bob H
First, the LangIDToCodePage code returns the same value as _getmbcp().

Does that mean that _getmbcp also works correctly when you switch code pages during the execution of the program ? Meaning if a japanese person needs to insert some characters in english / russian whathaveyou and switches keyboard locales while you program is running ?

**Bob H** · October 12th, 2002, 07:43 PM

I am fairly sure that English can be entered from a Japanese keyboard. I get a lot of emails from Japan in English.

**Yves M** · October 13th, 2002, 09:27 AM

Well, I can also enter Japanese on my spanish or my swiss keyboards when I switch input locales

Thread: Code page in WideCharToMultiByte

Thread Tools

Display

Code page in WideCharToMultiByte

Posting Permissions