June 22nd, 1999, 10:19 AM
Help me please!!! I need to convert string of text encoded with UTF-8 to ANSI string.
|
Click to See Complete Forum and Search --> : UTF-8 June 22nd, 1999, 10:19 AM Help me please!!! I need to convert string of text encoded with UTF-8 to ANSI string. Steve Kearon June 22nd, 1999, 10:57 AM I think thi sshould do it: static LPCTSTR UTF8toANSI(LPCTSTR s) { static TCHAR buf[BUFSIZ]; LPTSTR p = buf; while (*s) { // if its the first byte of a 2 byte UTF-8 char if ((TBYTE(*s) & 0xE0) == 0xC0) { BYTE b1 = (*s++ & 0x1F); BYTE b2 = (*s++ & 0x3F); int n = b1 << 6 | b2; *p++ = n; } else *p++ = *s++; } *p = _T('\0'); return buf; } June 22nd, 1999, 01:12 PM Thank you very match Steve! I try use your algorithm, but two byte characters does not converted properly:( Can you give me link to specification of UTF-8 if it is possible? Best regards, Michael Steve Kearon June 23rd, 1999, 02:23 AM I've beaten up the guy who did this, and his story is that he believes it ONLY works for chars in the range 0-255, since he couldn't see how to translate larger chars into ANSI. It was actually lifted from a bit of specialized code that simply had to keep an external lump of code sweet by using utf-8. One good reference I found: http://czyborra.com/utf/ If you make progress / identify a problem with the code, could you let me know ? Jean-Marie Carl June 23rd, 1999, 09:28 AM Hi, When you want to convert UTF-8 characters "larger" than 8-bit (>1 bytes in UTF8), you need to take care of the codepage when you treat the resulting ANSI string (and the screen font when you display it). Or you may convert directly to UNICODE. The API function "WideCharToMultiByte" seems to allow also using "real" UTF8 strings - but i never tried it. Maybe it's a good entry point for searching more information. Hope this helps, Greetz Jean-Marie June 24th, 1999, 05:15 AM Hi, Steave! I solve my problem by using API funcnton WideCharToMultiByte. void CText::UTF8toANSY(LPCTSTR src, CString &dst) { //determine length of UTF-8 encoded string int nLen = MultiByteToWideChar(CP_UTF8, 0, src, -1, NULL, NULL); LPWSTR lpszW = new WCHAR[nLen + 1]; //new widechar string LPSTR lpszA = new TCHAR[nLen + 1]; //new ANSY string //this step intended only to use WideCharToMultiByte MultiByteToWideChar(CP_UTF8, 0, src, -1, lpszW, nLen); //Conversion to ANSY (CP_ACP) WideCharToMultiByte(CP_ACP, 0, lpszW, -1, lpszA, nLen, NULL, NULL); lpszA[nLen] = 0; dst = lpszA; delete[] lpszW; delete[] lpszA; } WideCharToMultiByte maps a wide-character string to a multi-byte (including ANSY encoding) string. Whole range of wide characters can not be mapped to ANSY of course, but characters from current code page would be mapped properly! Regards, Michael June 24th, 1999, 05:22 AM Thank you, Jean-Marie! This code work! void CText::UTF8toANSY(LPCTSTR src, CString &dst) { //determine length of UTF-8 encoded string int nLen = MultiByteToWideChar(CP_UTF8, 0, src, -1, NULL, NULL); LPWSTR lpszW = new WCHAR[nLen + 1]; //new widechar string LPSTR lpszA = new TCHAR[nLen + 1]; //new ANSY string //this step intended only to use WideCharToMultiByte MultiByteToWideChar(CP_UTF8, 0, src, -1, lpszW, nLen); //Conversion to ANSY (CP_ACP) WideCharToMultiByte(CP_ACP, 0, lpszW, -1, lpszA, nLen, NULL, NULL); lpszA[nLen] = 0; dst = lpszA; delete[] lpszW; delete[] lpszA; } Regards, Michael codeguru.com
Copyright Internet.com Inc., All Rights Reserved. |