|
-
June 22nd, 1999, 10:19 AM
#1
UTF-8
Help me please!!! I need to convert string of text encoded with UTF-8 to ANSI string.
-
June 22nd, 1999, 10:57 AM
#2
Re: UTF-8
I think thi sshould do it:
static LPCTSTR UTF8toANSI(LPCTSTR s)
{
static TCHAR buf[BUFSIZ];
LPTSTR p = buf;
while (*s)
{
// if its the first byte of a 2 byte UTF-8 char
if ((TBYTE(*s) & 0xE0) == 0xC0)
{
BYTE b1 = (*s++ & 0x1F);
BYTE b2 = (*s++ & 0x3F);
int n = b1 << 6 | b2;
*p++ = n;
}
else
*p++ = *s++;
}
*p = _T('\0');
return buf;
}
-
June 22nd, 1999, 01:12 PM
#3
Re: UTF-8
Thank you very match Steve!
I try use your algorithm, but two byte characters does not converted properly
Can you give me link to specification of UTF-8 if it is possible?
Best regards,
Michael
-
June 23rd, 1999, 02:23 AM
#4
Re: UTF-8
I've beaten up the guy who did this, and his story is that he believes it ONLY works for chars in the range 0-255, since he couldn't see how to translate larger chars into ANSI.
It was actually lifted from a bit of specialized code that simply had to keep an external lump of code sweet by using utf-8.
One good reference I found:
http://czyborra.com/utf/
If you make progress / identify a problem with the code, could you let me know ?
-
June 23rd, 1999, 09:28 AM
#5
Re: UTF-8
Hi,
When you want to convert UTF-8 characters "larger" than 8-bit (>1 bytes in UTF8), you need to take care of the codepage when you treat the resulting ANSI string (and the screen font when you display it). Or you may convert directly to UNICODE.
The API function "WideCharToMultiByte" seems to allow also using "real" UTF8 strings - but i never tried it. Maybe it's a good entry point for searching more information.
Hope this helps,
Greetz
Jean-Marie
-
June 24th, 1999, 05:15 AM
#6
Re: UTF-8
Hi, Steave!
I solve my problem by using API funcnton WideCharToMultiByte.
void CText::UTF8toANSY(LPCTSTR src, CString &dst)
{
//determine length of UTF-8 encoded string
int nLen = MultiByteToWideChar(CP_UTF8, 0, src, -1, NULL, NULL);
LPWSTR lpszW = new WCHAR[nLen + 1]; //new widechar string
LPSTR lpszA = new TCHAR[nLen + 1]; //new ANSY string
//this step intended only to use WideCharToMultiByte
MultiByteToWideChar(CP_UTF8, 0, src, -1, lpszW, nLen);
//Conversion to ANSY (CP_ACP)
WideCharToMultiByte(CP_ACP, 0, lpszW, -1, lpszA, nLen, NULL, NULL);
lpszA[nLen] = 0;
dst = lpszA;
delete[] lpszW;
delete[] lpszA;
}
WideCharToMultiByte maps a wide-character string to a multi-byte (including ANSY encoding) string. Whole range of wide characters can not be mapped to ANSY of course, but characters from current code page would be mapped properly!
Regards,
Michael
-
June 24th, 1999, 05:22 AM
#7
This code work!
Thank you, Jean-Marie!
This code work!
void CText::UTF8toANSY(LPCTSTR src, CString &dst)
{
//determine length of UTF-8 encoded string
int nLen = MultiByteToWideChar(CP_UTF8, 0, src, -1, NULL, NULL);
LPWSTR lpszW = new WCHAR[nLen + 1]; //new widechar string
LPSTR lpszA = new TCHAR[nLen + 1]; //new ANSY string
//this step intended only to use WideCharToMultiByte
MultiByteToWideChar(CP_UTF8, 0, src, -1, lpszW, nLen);
//Conversion to ANSY (CP_ACP)
WideCharToMultiByte(CP_ACP, 0, lpszW, -1, lpszA, nLen, NULL, NULL);
lpszA[nLen] = 0;
dst = lpszA;
delete[] lpszW;
delete[] lpszA;
}
Regards,
Michael
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|