How to convert 2 wchar_t to 1 wchar_t
In a nutshell - I have a wstring, but each consecutive pair of wchar_t are to be converted to a single whar_t, and then this is the character code to use for display purposes.
How best to do the conversion?
How did it end up like this? downloading foreign web-pages that use a different charset for their content, but the html file itself is ASCII. So the browser would know to take two consecutive chars and convert into a single wchar. I read the contents into a wstring so each char is converted to wchar_t. This much I do not want to change.
Thanks in advance.
Re: How to convert 2 wchar_t to 1 wchar_t
Could you attach a sample file that you are trying to read? And some code demonstrating how you are currently reading the file.
gg
Re: How to convert 2 wchar_t to 1 wchar_t
I think its me getting confused. Im reading in and parsing a downloaded webpage but it uses the cyrillic character set. The file itself is still ASCII as all webpages are?!
But as I understand now - its necessary for me to switch code pages to map onto the cyrillic characters. Its very confusing all this until you know how.
So i was originally thinking there was a 2:1 mapping of characters when a webpage content is something other than our character set. But im wrong i think its still 1:1 but must use a code-page parameter when converting these strings.
Im taking out a subset of this content(cyrillic) and displaying it in a CListCtrl. Its just not happening yet.
Re: How to convert 2 wchar_t to 1 wchar_t
The web page may be encoded in UTF-8 Unicode. That's not quite the same thing as using a code page.
Re: How to convert 2 wchar_t to 1 wchar_t
The HTML should tell you how to interpret the bytes: http://en.wikipedia.org/wiki/Charact...odings_in_HTML
Once we know how the data is encoded, we can help will converting it to a wchar_t string.
Here are some references and conversion code samples to get you started: http://www.codeguru.com/forum/showpo...82&postcount=8
gg
Re: How to convert 2 wchar_t to 1 wchar_t
Quote:
Originally Posted by
Codeplug
Well its just a webpage from a Ukrainian website that has the following
<meta http-equiv="Content-Type" content="text/html; charset=windows-1251">
I need to take a subset of the text from this page and display it as cyrillic within a CListCtrl.
Re: How to convert 2 wchar_t to 1 wchar_t
The wikipedia page on 1251:
http://en.wikipedia.org/wiki/Windows-1251
You can use that page to create a 256-element array which maps the characters in the HTML document to their Unicode equivalents. Since none of the Unicode values on the page is excessively large, you should be able to drop them into a wchar_t directly, no need for anything special to make them UTF-16.
Re: How to convert 2 wchar_t to 1 wchar_t
Quote:
Originally Posted by
Lindley
The wikipedia page on 1251:
http://en.wikipedia.org/wiki/Windows-1251
You can use that page to create a 256-element array which maps the characters in the HTML document to their Unicode equivalents. Since none of the Unicode values on the page is excessively large, you should be able to drop them into a wchar_t directly, no need for anything special to make them UTF-16.
Thanks.
I was under the impression that i could call something like
CW2AEX helper class, specifiying proper code-page
identifier (e.g. 1251 for Windows-1251 Cyrillic) in the constructor.
Re: How to convert 2 wchar_t to 1 wchar_t
Maybe you can, I'm not an expert in that. I was merely offering an approach which would get the job done, not necessarily the best one.
Re: How to convert 2 wchar_t to 1 wchar_t
Quote:
Originally Posted by
Lindley
Maybe you can, I'm not an expert in that. I was merely offering an approach which would get the job done, not necessarily the best one.
Me neither, im just picking it up as I go along. At home right now so will try these approaches tomorrow and let you know how i get on. Cant wait to see some cyrillic text in my CListCtrl. Dont suppose many people wish for such a thing :-)
Re: How to convert 2 wchar_t to 1 wchar_t
Here's a more generic version of the conversion code samples:
Code:
#include <windows.h>
#include <string>
#include <sstream>
#include <vector>
std::wstring str_to_wstr(const std::string &str, UINT cp = CP_ACP)
{
int len = MultiByteToWideChar(cp, 0, str.c_str(), str.length(), 0, 0);
if (!len)
return L"ErrorA2W";
std::vector<wchar_t> wbuff(len + 1);
// NOTE: this does not NULL terminate the string in wbuff, but this is ok
// since it was zero-initialized in the vector constructor
if (!MultiByteToWideChar(cp, 0, str.c_str(), str.length(), &wbuff[0], len))
return L"ErrorA2W";
return &wbuff[0];
}//str_to_wstr
std::string wstr_to_str(const std::wstring &wstr, UINT cp = CP_ACP)
{
int len = WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(),
0, 0, 0, 0);
if (!len)
return "ErrorW2A";
std::vector<char> abuff(len + 1);
// NOTE: this does not NULL terminate the string in abuff, but this is ok
// since it was zero-initialized in the vector constructor
if (!WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(),
&abuff[0], len, 0, 0))
{
return "ErrorW2A";
}//if
return &abuff[0];
}//wstr_to_str
So you take a Cyrillic string, extracted from the HTML, and call:
Code:
std::wstring cyrillic_wstr = str_to_wstr(cyrillic_str, 1251);
gg
Re: How to convert 2 wchar_t to 1 wchar_t
Quote:
Originally Posted by
Codeplug
Here's a more generic version of the conversion code samples:
Code:
#include <windows.h>
#include <string>
#include <sstream>
#include <vector>
std::wstring str_to_wstr(const std::string &str, UINT cp = CP_ACP)
{
int len = MultiByteToWideChar(cp, 0, str.c_str(), str.length(), 0, 0);
if (!len)
return L"ErrorA2W";
std::vector<wchar_t> wbuff(len);
// NOTE: this does not NULL terminate the string in wbuff, but this is ok
// since it was zero-initialized in the vector constructor
if (!MultiByteToWideChar(cp, 0, str.c_str(), str.length(), &wbuff[0], len))
return L"ErrorA2W";
return &wbuff[0];
}//str_to_wstr
std::string wstr_to_str(const std::wstring &wstr, UINT cp = CP_ACP)
{
int len = WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(),
0, 0, 0, 0);
if (!len)
return "ErrorW2A";
std::vector<char> abuff(len + 1);
// NOTE: this does not NULL terminate the string in abuff, but this is ok
// since it was zero-initialized in the vector constructor
if (!WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(),
&abuff[0], len, 0, 0))
{
return "ErrorW2A";
}//if
return &abuff[0];
}//wstr_to_str
So you take a Cyrillic string, extracted from the HTML, and call:
Code:
std::wstring cyrillic_wstr = str_to_wstr(cyrillic_str, 1251);
gg
excellent, thanks i will give these a try tomorrow when im back at my desk.
Re: How to convert 2 wchar_t to 1 wchar_t
This will give the same results using ATL tools:
Code:
std::wstring cyrillic_wstr = ATL::CA2WEX<>(cyrillic_str.c_str(), 1251);
gg