How to tell if a unicode TCHAR can be converted to ANSI char?

**ahmd** · September 26th, 2009, 01:59 AM

At this point I'm using a call to WideCharToMultiByte(CP_ACP, WC_DEFAULTCHAR, ..., &UsedDefaultChar) and then check if bUsedDefaultChar was set to TRUE, which would mean that the conversion is not possible, but that method seems to be too much of an overkill for just one TCHAR. Can someone suggest a better way to do it?

**Ajay Vijay** · September 26th, 2009, 02:35 AM

I may be wrong, but you can check if TCHAR variable is < 127, which means it's ANSI.

**ahmd** · September 26th, 2009, 02:45 AM

Thanks, but that is not that simple. Some non-English letters may be converted into an ANSI charset if it is installed by default on the system. Unfortunately I can't test it myself on this PC

Well, if there's no other way than calling WideCharToMultiByte, is the following acceptable?

Code:

BOOL CheckAcceptableChar(TCHAR ch)
{
    BOOL bDefaultUsed = TRUE;
    VERIFY(WideCharToMultiByte(CP_ACP, 0, &ch, 1, NULL, 0, NULL, &bDefaultUsed));
    return !bDefaultUsed;
}

**ahmd** · September 26th, 2009, 11:44 AM

Folks, anyone?

**Codeplug** · September 26th, 2009, 01:59 PM

Why do you want to do this? And why one character at a time?

Do you care if the conversion is "round-trip 'able"? (ie use WC_NO_BEST_FIT_CHARS)?

Passing a WCHAR (wchar_t) instead of TCHAR would make more sense. The problem here is that a single WCHAR isn't always a single "character". Two examples: 1) Unicode characters outside the BMP (0x0000 - 0xFFFF) require 2 WCHAR's to represent 1 character. 2) "Decomposed" Unicode characters. For example, "0x0041 + 0x0308" = "capital A + dieresis", or Ä. When "precomposed", Ä = 0x00C4.

So if you *know* that all Unicode characters are "precomposed", and you will never have Unicode characters outside the BMP - then you could pass in a single WCHAR, representing a single Unicode character. But why one "character" at a time?

Code:

#include <windows.h>
#include <iostream>
#include <iomanip>
using namespace std;

BOOL CP_Convertable(const WCHAR *p, UINT len, UINT cp = CP_ACP, BOOL bNoBestFit = TRUE)
{
    // Ajay Vijay optimization :)
    //  all codepages and Unicode are the same below 127
    if ((len == 1) && (*p < 127))
        return TRUE;

    BOOL bDefaultUsed = TRUE;
    DWORD flags = 0;
    
    if (bNoBestFit)
        flags = WC_NO_BEST_FIT_CHARS;

    int rc;
    for (;;)
    {
        rc = WideCharToMultiByte(cp, flags, p, len, 0, 0, 0, &bDefaultUsed);
        if (!rc && flags && (GetLastError() == ERROR_INVALID_FLAGS))
        {
            // flags may not be valid for given codepage
            flags = 0;
            continue;
        }//if
        
        break;
    }//for

    return rc && !bDefaultUsed;
}//CP_Convertable

void test_wchar(WCHAR c, UINT cp)
{
    char old_fill = cout.fill('0');
    if (CP_Convertable(&c, 1, cp))
    {
        cout << "0x" << setw(4) << hex << c << dec
             << " is convertible to CP " << cp << endl;
    }//if
    else
    {
        cout << "0x" << setw(4) << hex << c << dec
             << " is NOT convertible to CP " << cp << endl;
    }//else
    cout.fill(old_fill);
}//test_wchar

int main()
{
    UINT cp = 1250; // test with 1250
    WCHAR w_good = 0x0107; // LATIN SMALL LETTER C WITH ACUTE, 0xE6 in cp1250
    WCHAR w_bad = 0xFF99; // HALFWIDTH KATAKANA LETTER RU, not in cp1250

    test_wchar(w_good, cp);
    test_wchar(w_bad, cp);

    return 0;
}//main

Thread: How to tell if a unicode TCHAR can be converted to ANSI char?

Thread Tools

Display

How to tell if a unicode TCHAR can be converted to ANSI char?

Re: How to tell if a unicode TCHAR can be converted to ANSI char?

Re: How to tell if a unicode TCHAR can be converted to ANSI char?

Re: How to tell if a unicode TCHAR can be converted to ANSI char?

Re: How to tell if a unicode TCHAR can be converted to ANSI char?

Posting Permissions