CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 7 of 7

Thread: UTF-8

  1. #1
    Guest

    UTF-8

    Help me please!!! I need to convert string of text encoded with UTF-8 to ANSI string.


  2. #2
    Join Date
    Jun 1999
    Location
    Cardiff, UK
    Posts
    11

    Re: UTF-8

    I think thi sshould do it:

    static LPCTSTR UTF8toANSI(LPCTSTR s)
    {
    static TCHAR buf[BUFSIZ];
    LPTSTR p = buf;
    while (*s)
    {
    // if its the first byte of a 2 byte UTF-8 char
    if ((TBYTE(*s) & 0xE0) == 0xC0)
    {
    BYTE b1 = (*s++ & 0x1F);
    BYTE b2 = (*s++ & 0x3F);
    int n = b1 << 6 | b2;
    *p++ = n;
    }
    else
    *p++ = *s++;
    }
    *p = _T('\0');
    return buf;
    }






  3. #3
    Guest

    Re: UTF-8

    Thank you very match Steve!
    I try use your algorithm, but two byte characters does not converted properly
    Can you give me link to specification of UTF-8 if it is possible?

    Best regards,
    Michael



  4. #4
    Join Date
    Jun 1999
    Location
    Cardiff, UK
    Posts
    11

    Re: UTF-8

    I've beaten up the guy who did this, and his story is that he believes it ONLY works for chars in the range 0-255, since he couldn't see how to translate larger chars into ANSI.

    It was actually lifted from a bit of specialized code that simply had to keep an external lump of code sweet by using utf-8.

    One good reference I found:

    http://czyborra.com/utf/

    If you make progress / identify a problem with the code, could you let me know ?



  5. #5
    Join Date
    Jun 1999
    Posts
    27

    Re: UTF-8

    Hi,

    When you want to convert UTF-8 characters "larger" than 8-bit (>1 bytes in UTF8), you need to take care of the codepage when you treat the resulting ANSI string (and the screen font when you display it). Or you may convert directly to UNICODE.

    The API function "WideCharToMultiByte" seems to allow also using "real" UTF8 strings - but i never tried it. Maybe it's a good entry point for searching more information.

    Hope this helps,

    Greetz
    Jean-Marie


  6. #6
    Guest

    Re: UTF-8

    Hi, Steave!

    I solve my problem by using API funcnton WideCharToMultiByte.

    void CText::UTF8toANSY(LPCTSTR src, CString &dst)
    {
    //determine length of UTF-8 encoded string
    int nLen = MultiByteToWideChar(CP_UTF8, 0, src, -1, NULL, NULL);

    LPWSTR lpszW = new WCHAR[nLen + 1]; //new widechar string
    LPSTR lpszA = new TCHAR[nLen + 1]; //new ANSY string

    //this step intended only to use WideCharToMultiByte
    MultiByteToWideChar(CP_UTF8, 0, src, -1, lpszW, nLen);

    //Conversion to ANSY (CP_ACP)
    WideCharToMultiByte(CP_ACP, 0, lpszW, -1, lpszA, nLen, NULL, NULL);

    lpszA[nLen] = 0;
    dst = lpszA;


    delete[] lpszW;
    delete[] lpszA;
    }






    WideCharToMultiByte maps a wide-character string to a multi-byte (including ANSY encoding) string. Whole range of wide characters can not be mapped to ANSY of course, but characters from current code page would be mapped properly!

    Regards,
    Michael



  7. #7
    Guest

    This code work!

    Thank you, Jean-Marie!

    This code work!


    void CText::UTF8toANSY(LPCTSTR src, CString &dst)
    {
    //determine length of UTF-8 encoded string
    int nLen = MultiByteToWideChar(CP_UTF8, 0, src, -1, NULL, NULL);

    LPWSTR lpszW = new WCHAR[nLen + 1]; //new widechar string
    LPSTR lpszA = new TCHAR[nLen + 1]; //new ANSY string

    //this step intended only to use WideCharToMultiByte
    MultiByteToWideChar(CP_UTF8, 0, src, -1, lpszW, nLen);

    //Conversion to ANSY (CP_ACP)
    WideCharToMultiByte(CP_ACP, 0, lpszW, -1, lpszA, nLen, NULL, NULL);

    lpszA[nLen] = 0;
    dst = lpszA;


    delete[] lpszW;
    delete[] lpszA;
    }




    Regards,
    Michael



Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured