-
September 26th, 2009, 01:59 AM
#1
How to tell if a unicode TCHAR can be converted to ANSI char?
At this point I'm using a call to WideCharToMultiByte(CP_ACP, WC_DEFAULTCHAR, ..., &UsedDefaultChar) and then check if bUsedDefaultChar was set to TRUE, which would mean that the conversion is not possible, but that method seems to be too much of an overkill for just one TCHAR. Can someone suggest a better way to do it?
-
September 26th, 2009, 02:35 AM
#2
Re: How to tell if a unicode TCHAR can be converted to ANSI char?
I may be wrong, but you can check if TCHAR variable is < 127, which means it's ANSI.
-
September 26th, 2009, 02:45 AM
#3
Re: How to tell if a unicode TCHAR can be converted to ANSI char?
Thanks, but that is not that simple. Some non-English letters may be converted into an ANSI charset if it is installed by default on the system. Unfortunately I can't test it myself on this PC
Well, if there's no other way than calling WideCharToMultiByte, is the following acceptable?
Code:
BOOL CheckAcceptableChar(TCHAR ch)
{
BOOL bDefaultUsed = TRUE;
VERIFY(WideCharToMultiByte(CP_ACP, 0, &ch, 1, NULL, 0, NULL, &bDefaultUsed));
return !bDefaultUsed;
}
-
September 26th, 2009, 11:44 AM
#4
Re: How to tell if a unicode TCHAR can be converted to ANSI char?
-
September 26th, 2009, 01:59 PM
#5
Re: How to tell if a unicode TCHAR can be converted to ANSI char?
Why do you want to do this? And why one character at a time?
Do you care if the conversion is "round-trip 'able"? (ie use WC_NO_BEST_FIT_CHARS)?
Passing a WCHAR (wchar_t) instead of TCHAR would make more sense. The problem here is that a single WCHAR isn't always a single "character". Two examples: 1) Unicode characters outside the BMP (0x0000 - 0xFFFF) require 2 WCHAR's to represent 1 character. 2) "Decomposed" Unicode characters. For example, "0x0041 + 0x0308" = "capital A + dieresis", or Ä. When "precomposed", Ä = 0x00C4.
So if you *know* that all Unicode characters are "precomposed", and you will never have Unicode characters outside the BMP - then you could pass in a single WCHAR, representing a single Unicode character. But why one "character" at a time?
Code:
#include <windows.h>
#include <iostream>
#include <iomanip>
using namespace std;
BOOL CP_Convertable(const WCHAR *p, UINT len, UINT cp = CP_ACP, BOOL bNoBestFit = TRUE)
{
// Ajay Vijay optimization :)
// all codepages and Unicode are the same below 127
if ((len == 1) && (*p < 127))
return TRUE;
BOOL bDefaultUsed = TRUE;
DWORD flags = 0;
if (bNoBestFit)
flags = WC_NO_BEST_FIT_CHARS;
int rc;
for (;;)
{
rc = WideCharToMultiByte(cp, flags, p, len, 0, 0, 0, &bDefaultUsed);
if (!rc && flags && (GetLastError() == ERROR_INVALID_FLAGS))
{
// flags may not be valid for given codepage
flags = 0;
continue;
}//if
break;
}//for
return rc && !bDefaultUsed;
}//CP_Convertable
void test_wchar(WCHAR c, UINT cp)
{
char old_fill = cout.fill('0');
if (CP_Convertable(&c, 1, cp))
{
cout << "0x" << setw(4) << hex << c << dec
<< " is convertible to CP " << cp << endl;
}//if
else
{
cout << "0x" << setw(4) << hex << c << dec
<< " is NOT convertible to CP " << cp << endl;
}//else
cout.fill(old_fill);
}//test_wchar
int main()
{
UINT cp = 1250; // test with 1250
WCHAR w_good = 0x0107; // LATIN SMALL LETTER C WITH ACUTE, 0xE6 in cp1250
WCHAR w_bad = 0xFF99; // HALFWIDTH KATAKANA LETTER RU, not in cp1250
test_wchar(w_good, cp);
test_wchar(w_bad, cp);
return 0;
}//main
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|