Click to See Complete Forum and Search --> : Extended ASCII character set
eug_prog
April 14th, 2003, 04:27 PM
Folks,
Does anybody know how to figure out whether an extended ASCII character is a good printable character for a localized language?
In other words, for standard ASCII char set, I could call
setlocale (LC_CTYPE, "language_of_my_choice");
int ac;
// init stuff to do here ...
if (__isascii(ac) && isprint(ac)) {
// found and ASCII printable character !!!
}
The code works fine on standard ASCII char set - it doesn't seem to work with extended... for simple reason - in macro definitions of __isascii() and isprint(), the last possible character is 7F, and I need something like FF.
Do you know of a function that would tell me that, for example, character 'ä' (or 'é'. etc) is a valid printable character for, say, French language?
eug_prog
April 14th, 2003, 04:30 PM
oops.. for some reason characters " 'a' tilda' and "e accentuated" got converted to Slavic letters.. Don't know why, though.
Anyway, even for these Slavic letters, standards ASCII routines will fail. :(
Yves M
April 14th, 2003, 05:38 PM
It really depends on the codepage you are using. There are no "standard" functions to tell you whether a character is printable or not in a give codepage. Depending on what you want to use the codepages for, it might be better to write the program to handle text in Unicode internally and then figure out the printable characters from there (which is again not trivial unfortunately, but there is a lot of information on unicode.org).
The quick and dirty fix would be to get the information about the codepages you want to support (there are quite a few, check for example the list of codepages supported by MS Windows (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_81rn.asp) ). In Windows you can use GetStringTypeEx(W) for determining what kind of characters your string is made of.
If you are looking for a more platform independent solution, check out GNU's libiconv (http://www.gnu.org/software/libiconv/).
To see the problems that arise, check for example this page (http://czyborra.com/charsets/iso8859.html#ISO-8859-4). You will see that different codepages have "blanks" in different places, so you can never be really sure whether a character is printable in a give codepage or not, unless your program knows one way or another something about the codepage.
eug_prog
April 15th, 2003, 09:09 AM
Yves,
Thanks for your suggestions!
I am using MS Dev IDE, so libiconv will not help much, unfortunately. I am a little cautious about WinAPI, though, because I am coding SMTP traps, and they are a little lower-level than WinAPI. Still, I will try WinAPI solution just to see how it works.
It's a good starting point, in any case.
Richard.J
April 16th, 2003, 11:02 AM
instead of the C-type isascii, maybe the C++-STL (see <ctype>) is of any help?
eug_prog
April 17th, 2003, 01:22 PM
Richard,
I would love to try out your suggestion, but our software is written in C, so - no STL at my disposal, alas.
Interesting suggestion, though!
codeguru.com
Copyright Internet.com Inc., All Rights Reserved.