CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 13 of 13
  1. #1
    Join Date
    Apr 2005
    Posts
    78

    Question How to convert 2 wchar_t to 1 wchar_t

    In a nutshell - I have a wstring, but each consecutive pair of wchar_t are to be converted to a single whar_t, and then this is the character code to use for display purposes.
    How best to do the conversion?


    How did it end up like this? downloading foreign web-pages that use a different charset for their content, but the html file itself is ASCII. So the browser would know to take two consecutive chars and convert into a single wchar. I read the contents into a wstring so each char is converted to wchar_t. This much I do not want to change.

    Thanks in advance.

  2. #2
    Join Date
    Nov 2003
    Posts
    1,902

    Re: How to convert 2 wchar_t to 1 wchar_t

    Could you attach a sample file that you are trying to read? And some code demonstrating how you are currently reading the file.

    gg

  3. #3
    Join Date
    Apr 2005
    Posts
    78

    Angry Re: How to convert 2 wchar_t to 1 wchar_t

    I think its me getting confused. Im reading in and parsing a downloaded webpage but it uses the cyrillic character set. The file itself is still ASCII as all webpages are?!

    But as I understand now - its necessary for me to switch code pages to map onto the cyrillic characters. Its very confusing all this until you know how.

    So i was originally thinking there was a 2:1 mapping of characters when a webpage content is something other than our character set. But im wrong i think its still 1:1 but must use a code-page parameter when converting these strings.

    Im taking out a subset of this content(cyrillic) and displaying it in a CListCtrl. Its just not happening yet.

  4. #4
    Lindley is offline Elite Member Power Poster
    Join Date
    Oct 2007
    Location
    Seattle, WA
    Posts
    10,895

    Re: How to convert 2 wchar_t to 1 wchar_t

    The web page may be encoded in UTF-8 Unicode. That's not quite the same thing as using a code page.

  5. #5
    Join Date
    Nov 2003
    Posts
    1,902

    Re: How to convert 2 wchar_t to 1 wchar_t

    The HTML should tell you how to interpret the bytes: http://en.wikipedia.org/wiki/Charact...odings_in_HTML

    Once we know how the data is encoded, we can help will converting it to a wchar_t string.

    Here are some references and conversion code samples to get you started: http://www.codeguru.com/forum/showpo...82&postcount=8

    gg

  6. #6
    Join Date
    Apr 2005
    Posts
    78

    Re: How to convert 2 wchar_t to 1 wchar_t

    Quote Originally Posted by Codeplug View Post
    The HTML should tell you how to interpret the bytes: http://en.wikipedia.org/wiki/Charact...odings_in_HTML

    Once we know how the data is encoded, we can help will converting it to a wchar_t string.

    Here are some references and conversion code samples to get you started: http://www.codeguru.com/forum/showpo...82&postcount=8

    gg
    Well its just a webpage from a Ukrainian website that has the following
    <meta http-equiv="Content-Type" content="text/html; charset=windows-1251">

    I need to take a subset of the text from this page and display it as cyrillic within a CListCtrl.

  7. #7
    Lindley is offline Elite Member Power Poster
    Join Date
    Oct 2007
    Location
    Seattle, WA
    Posts
    10,895

    Re: How to convert 2 wchar_t to 1 wchar_t

    The wikipedia page on 1251:
    http://en.wikipedia.org/wiki/Windows-1251
    You can use that page to create a 256-element array which maps the characters in the HTML document to their Unicode equivalents. Since none of the Unicode values on the page is excessively large, you should be able to drop them into a wchar_t directly, no need for anything special to make them UTF-16.
    Last edited by Lindley; September 9th, 2009 at 03:41 PM.

  8. #8
    Join Date
    Apr 2005
    Posts
    78

    Re: How to convert 2 wchar_t to 1 wchar_t

    Quote Originally Posted by Lindley View Post
    The wikipedia page on 1251:
    http://en.wikipedia.org/wiki/Windows-1251
    You can use that page to create a 256-element array which maps the characters in the HTML document to their Unicode equivalents. Since none of the Unicode values on the page is excessively large, you should be able to drop them into a wchar_t directly, no need for anything special to make them UTF-16.
    Thanks.

    I was under the impression that i could call something like
    CW2AEX helper class, specifiying proper code-page
    identifier (e.g. 1251 for Windows-1251 Cyrillic) in the constructor.

  9. #9
    Lindley is offline Elite Member Power Poster
    Join Date
    Oct 2007
    Location
    Seattle, WA
    Posts
    10,895

    Re: How to convert 2 wchar_t to 1 wchar_t

    Maybe you can, I'm not an expert in that. I was merely offering an approach which would get the job done, not necessarily the best one.

  10. #10
    Join Date
    Apr 2005
    Posts
    78

    Re: How to convert 2 wchar_t to 1 wchar_t

    Quote Originally Posted by Lindley View Post
    Maybe you can, I'm not an expert in that. I was merely offering an approach which would get the job done, not necessarily the best one.
    Me neither, im just picking it up as I go along. At home right now so will try these approaches tomorrow and let you know how i get on. Cant wait to see some cyrillic text in my CListCtrl. Dont suppose many people wish for such a thing :-)

  11. #11
    Join Date
    Nov 2003
    Posts
    1,902

    Re: How to convert 2 wchar_t to 1 wchar_t

    Here's a more generic version of the conversion code samples:
    Code:
    #include <windows.h>
    #include <string>
    #include <sstream>
    #include <vector>
    
    std::wstring str_to_wstr(const std::string &str, UINT cp = CP_ACP)
    {
        int len = MultiByteToWideChar(cp, 0, str.c_str(), str.length(), 0, 0);
        if (!len)
            return L"ErrorA2W";
        
        std::vector<wchar_t> wbuff(len + 1);
        // NOTE: this does not NULL terminate the string in wbuff, but this is ok
        //       since it was zero-initialized in the vector constructor
        if (!MultiByteToWideChar(cp, 0, str.c_str(), str.length(), &wbuff[0], len))
            return L"ErrorA2W";
    
        return &wbuff[0];
    }//str_to_wstr
    
    std::string wstr_to_str(const std::wstring &wstr, UINT cp = CP_ACP)
    {
        int len = WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(), 
                                      0, 0, 0, 0);
        if (!len)
            return "ErrorW2A";
    
        std::vector<char> abuff(len + 1);
    
        // NOTE: this does not NULL terminate the string in abuff, but this is ok
        //       since it was zero-initialized in the vector constructor
        if (!WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(), 
                                 &abuff[0], len, 0, 0))
        {
            return "ErrorW2A";
        }//if
    
        return &abuff[0];
    }//wstr_to_str
    So you take a Cyrillic string, extracted from the HTML, and call:
    Code:
    std::wstring cyrillic_wstr = str_to_wstr(cyrillic_str, 1251);
    gg
    Last edited by Codeplug; September 18th, 2009 at 08:24 AM. Reason: bug fix

  12. #12
    Join Date
    Apr 2005
    Posts
    78

    Smile Re: How to convert 2 wchar_t to 1 wchar_t

    Quote Originally Posted by Codeplug View Post
    Here's a more generic version of the conversion code samples:
    Code:
    #include <windows.h>
    #include <string>
    #include <sstream>
    #include <vector>
    
    std::wstring str_to_wstr(const std::string &str, UINT cp = CP_ACP)
    {
        int len = MultiByteToWideChar(cp, 0, str.c_str(), str.length(), 0, 0);
        if (!len)
            return L"ErrorA2W";
        
        std::vector<wchar_t> wbuff(len);
        // NOTE: this does not NULL terminate the string in wbuff, but this is ok
        //       since it was zero-initialized in the vector constructor
        if (!MultiByteToWideChar(cp, 0, str.c_str(), str.length(), &wbuff[0], len))
            return L"ErrorA2W";
    
        return &wbuff[0];
    }//str_to_wstr
    
    std::string wstr_to_str(const std::wstring &wstr, UINT cp = CP_ACP)
    {
        int len = WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(), 
                                      0, 0, 0, 0);
        if (!len)
            return "ErrorW2A";
    
        std::vector<char> abuff(len + 1);
    
        // NOTE: this does not NULL terminate the string in abuff, but this is ok
        //       since it was zero-initialized in the vector constructor
        if (!WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(), 
                                 &abuff[0], len, 0, 0))
        {
            return "ErrorW2A";
        }//if
    
        return &abuff[0];
    }//wstr_to_str
    So you take a Cyrillic string, extracted from the HTML, and call:
    Code:
    std::wstring cyrillic_wstr = str_to_wstr(cyrillic_str, 1251);
    gg
    excellent, thanks i will give these a try tomorrow when im back at my desk.

  13. #13
    Join Date
    Nov 2003
    Posts
    1,902

    Re: How to convert 2 wchar_t to 1 wchar_t

    This will give the same results using ATL tools:
    Code:
    std::wstring cyrillic_wstr = ATL::CA2WEX<>(cyrillic_str.c_str(), 1251);
    gg

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured