|
-
September 9th, 2009, 07:27 AM
#1
How to convert 2 wchar_t to 1 wchar_t
In a nutshell - I have a wstring, but each consecutive pair of wchar_t are to be converted to a single whar_t, and then this is the character code to use for display purposes.
How best to do the conversion?
How did it end up like this? downloading foreign web-pages that use a different charset for their content, but the html file itself is ASCII. So the browser would know to take two consecutive chars and convert into a single wchar. I read the contents into a wstring so each char is converted to wchar_t. This much I do not want to change.
Thanks in advance.
-
September 9th, 2009, 12:25 PM
#2
Re: How to convert 2 wchar_t to 1 wchar_t
Could you attach a sample file that you are trying to read? And some code demonstrating how you are currently reading the file.
gg
-
September 9th, 2009, 01:01 PM
#3
Re: How to convert 2 wchar_t to 1 wchar_t
I think its me getting confused. Im reading in and parsing a downloaded webpage but it uses the cyrillic character set. The file itself is still ASCII as all webpages are?!
But as I understand now - its necessary for me to switch code pages to map onto the cyrillic characters. Its very confusing all this until you know how.
So i was originally thinking there was a 2:1 mapping of characters when a webpage content is something other than our character set. But im wrong i think its still 1:1 but must use a code-page parameter when converting these strings.
Im taking out a subset of this content(cyrillic) and displaying it in a CListCtrl. Its just not happening yet.
-
September 9th, 2009, 01:11 PM
#4
Re: How to convert 2 wchar_t to 1 wchar_t
The web page may be encoded in UTF-8 Unicode. That's not quite the same thing as using a code page.
-
September 9th, 2009, 01:31 PM
#5
Re: How to convert 2 wchar_t to 1 wchar_t
The HTML should tell you how to interpret the bytes: http://en.wikipedia.org/wiki/Charact...odings_in_HTML
Once we know how the data is encoded, we can help will converting it to a wchar_t string.
Here are some references and conversion code samples to get you started: http://www.codeguru.com/forum/showpo...82&postcount=8
gg
-
September 9th, 2009, 03:27 PM
#6
Re: How to convert 2 wchar_t to 1 wchar_t
 Originally Posted by Codeplug
Well its just a webpage from a Ukrainian website that has the following
<meta http-equiv="Content-Type" content="text/html; charset=windows-1251">
I need to take a subset of the text from this page and display it as cyrillic within a CListCtrl.
-
September 9th, 2009, 03:38 PM
#7
Re: How to convert 2 wchar_t to 1 wchar_t
The wikipedia page on 1251:
http://en.wikipedia.org/wiki/Windows-1251
You can use that page to create a 256-element array which maps the characters in the HTML document to their Unicode equivalents. Since none of the Unicode values on the page is excessively large, you should be able to drop them into a wchar_t directly, no need for anything special to make them UTF-16.
Last edited by Lindley; September 9th, 2009 at 03:41 PM.
-
September 9th, 2009, 03:45 PM
#8
Re: How to convert 2 wchar_t to 1 wchar_t
 Originally Posted by Lindley
The wikipedia page on 1251:
http://en.wikipedia.org/wiki/Windows-1251
You can use that page to create a 256-element array which maps the characters in the HTML document to their Unicode equivalents. Since none of the Unicode values on the page is excessively large, you should be able to drop them into a wchar_t directly, no need for anything special to make them UTF-16.
Thanks.
I was under the impression that i could call something like
CW2AEX helper class, specifiying proper code-page
identifier (e.g. 1251 for Windows-1251 Cyrillic) in the constructor.
-
September 9th, 2009, 03:53 PM
#9
Re: How to convert 2 wchar_t to 1 wchar_t
Maybe you can, I'm not an expert in that. I was merely offering an approach which would get the job done, not necessarily the best one.
-
September 9th, 2009, 03:58 PM
#10
Re: How to convert 2 wchar_t to 1 wchar_t
 Originally Posted by Lindley
Maybe you can, I'm not an expert in that. I was merely offering an approach which would get the job done, not necessarily the best one.
Me neither, im just picking it up as I go along. At home right now so will try these approaches tomorrow and let you know how i get on. Cant wait to see some cyrillic text in my CListCtrl. Dont suppose many people wish for such a thing :-)
-
September 9th, 2009, 04:04 PM
#11
Re: How to convert 2 wchar_t to 1 wchar_t
Here's a more generic version of the conversion code samples:
Code:
#include <windows.h>
#include <string>
#include <sstream>
#include <vector>
std::wstring str_to_wstr(const std::string &str, UINT cp = CP_ACP)
{
int len = MultiByteToWideChar(cp, 0, str.c_str(), str.length(), 0, 0);
if (!len)
return L"ErrorA2W";
std::vector<wchar_t> wbuff(len + 1);
// NOTE: this does not NULL terminate the string in wbuff, but this is ok
// since it was zero-initialized in the vector constructor
if (!MultiByteToWideChar(cp, 0, str.c_str(), str.length(), &wbuff[0], len))
return L"ErrorA2W";
return &wbuff[0];
}//str_to_wstr
std::string wstr_to_str(const std::wstring &wstr, UINT cp = CP_ACP)
{
int len = WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(),
0, 0, 0, 0);
if (!len)
return "ErrorW2A";
std::vector<char> abuff(len + 1);
// NOTE: this does not NULL terminate the string in abuff, but this is ok
// since it was zero-initialized in the vector constructor
if (!WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(),
&abuff[0], len, 0, 0))
{
return "ErrorW2A";
}//if
return &abuff[0];
}//wstr_to_str
So you take a Cyrillic string, extracted from the HTML, and call:
Code:
std::wstring cyrillic_wstr = str_to_wstr(cyrillic_str, 1251);
gg
Last edited by Codeplug; September 18th, 2009 at 08:24 AM.
Reason: bug fix
-
September 9th, 2009, 04:06 PM
#12
Re: How to convert 2 wchar_t to 1 wchar_t
 Originally Posted by Codeplug
Here's a more generic version of the conversion code samples:
Code:
#include <windows.h>
#include <string>
#include <sstream>
#include <vector>
std::wstring str_to_wstr(const std::string &str, UINT cp = CP_ACP)
{
int len = MultiByteToWideChar(cp, 0, str.c_str(), str.length(), 0, 0);
if (!len)
return L"ErrorA2W";
std::vector<wchar_t> wbuff(len);
// NOTE: this does not NULL terminate the string in wbuff, but this is ok
// since it was zero-initialized in the vector constructor
if (!MultiByteToWideChar(cp, 0, str.c_str(), str.length(), &wbuff[0], len))
return L"ErrorA2W";
return &wbuff[0];
}//str_to_wstr
std::string wstr_to_str(const std::wstring &wstr, UINT cp = CP_ACP)
{
int len = WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(),
0, 0, 0, 0);
if (!len)
return "ErrorW2A";
std::vector<char> abuff(len + 1);
// NOTE: this does not NULL terminate the string in abuff, but this is ok
// since it was zero-initialized in the vector constructor
if (!WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(),
&abuff[0], len, 0, 0))
{
return "ErrorW2A";
}//if
return &abuff[0];
}//wstr_to_str
So you take a Cyrillic string, extracted from the HTML, and call:
Code:
std::wstring cyrillic_wstr = str_to_wstr(cyrillic_str, 1251);
gg
excellent, thanks i will give these a try tomorrow when im back at my desk.
-
September 9th, 2009, 04:12 PM
#13
Re: How to convert 2 wchar_t to 1 wchar_t
This will give the same results using ATL tools:
Code:
std::wstring cyrillic_wstr = ATL::CA2WEX<>(cyrillic_str.c_str(), 1251);
gg
Tags for this Thread
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|