i have a char* containing this: C%3A%2F0
i would like to transform this to another variable which contains the string C:/0
is there an easy way to do this?
Printable View
i have a char* containing this: C%3A%2F0
i would like to transform this to another variable which contains the string C:/0
is there an easy way to do this?
If you used CString you could use Replace to replace %3A with : ,etc. In a char*, you have to do these things manually.
thanks for the reply, i will clarify my situation:
i am sending a unicode string from flash to my c++ dll, but the method of transportation only supports narrow characters, so for example if i send the character é across, it appears in my char* as é, i.e. two characters i assume because unicode takes twice the space. I tried to convert my char* to wstring using thw following function but it doesn't work, so string just is the same as it was before, i.e. é instead of what i want, which is é...
Code:std::wstring str_to_wstr( const std::string& str )
{
std::wstring wstr( str.length()+1, 0 );
MultiByteToWideChar( CP_ACP,
0,
str.c_str(),
str.length(),
&wstr[0],
str.length() );
return wstr;
}
sorry for wasting your time, i am an ignorant dumbass - my function works if the first param is CP_UTF8!
Right. ;) BTW, there is an FAQ about this: http://www.codeguru.com/forum/showthread.php?t=231165.
>> &wstr[0],
This is non-standard and will result in undefined behavior. You need to use a vector for this.
ggCode:std::wstring utf8_to_wstr(const std::string &utf8)
{
// NOTE: we are assuming that the number of bytes in a UTF8 string is
// always >= to the number of wchar_t's required to represent that
// string in UTF16LE - which should hold true
std::vector<wchar_t> wbuff(utf8.length() + 1);
if (!MultiByteToWideChar(CP_UTF8, 0,
utf8.c_str(), utf8.length(),
&wbuff[0], utf8.length() + 1))
{
return L"Error";
}//if
return &wbuff[0];
}//utf8_to_wstr
Thanks Codeplug, i used the s****etded function and i did not work, then i used yours and i did work. One small thing i noticed is it returns "Error" for zero-length string, so i just stuck inThanksCode:if(utf8 == "") return L"";
Hi Guys,
After your helpful comments i came up with a system which i think works. I convert to wstring, then back to string! My code is below. Is there a better way to do this?
Code:std::wstring utf8_to_wstr(const std::string &utf8)
{
if(utf8 == "")
return L"";
std::vector<wchar_t> wbuff(utf8.length() + 1);
if (!MultiByteToWideChar(CP_UTF8, 0,
utf8.c_str(), utf8.length(),
&wbuff[0], utf8.length() + 1))
{
DWORD e = ::GetLastError();
switch(e)
{
case ERROR_INSUFFICIENT_BUFFER:
case ERROR_INVALID_FLAGS:
case ERROR_INVALID_PARAMETER:
case ERROR_NO_UNICODE_TRANSLATION:
return L"Error";
default:
break;
}
}
return &wbuff[0];
}
std::string wstr_to_str( const std::wstring& wstr )
{
if(wstr == L"")
return "";
std::vector<char> buff(wstr.length() + 1);
size_t size = wstr.length();
//std::string str( size + 1, 0 );
WideCharToMultiByte( CP_ACP,
0,
wstr.c_str(),
size,
&buff[0],
size+1,
NULL,
NULL );
return &buff[0];
}
string& cleanString(string& str)
{
wstring ws = utf8_to_wstr(str);
str = wstr_to_str(ws);
return str;
}
Why don't you let MultiByteToWideChar tell you how many characters it needs? Did you look at the FAQ I was pointing?
yes, i tried to use this function, and the string looked Ok in the variable window, but i got some errors when doing operations with this new string. The errors where strange to say the least. I used Codeplugs function and everything works fine. Unforunately i have no time to spare to vigorously go investigate further...
Why are you trying to convert a UTF8 string into a ANSI/ASCII string? Any Unicode characters outside the current ANSI code page will be replaced with a "default" character - usually '?' or '_'.
Here are updated functions with updated comments:
ggCode:std::wstring utf8_to_wstr(const std::string &utf8)
{
// NOTE: we are assuming that the number of bytes in a UTF8 string is
// always >= to the number of wchar_t's required to represent that
// string in UTF16LE - which should hold true
std::vector<wchar_t> wbuff(utf8.length() + 1);
// NOTE: this does not NULL terminate the string in wbuff, but this is ok
// since it was zero-initialized in the vector constructor
if (!MultiByteToWideChar(CP_UTF8, 0, utf8.c_str(), utf8.length(),
&wbuff[0], utf8.length()))
{
return L"ErrorW";
}//if
return &wbuff[0];
}//utf8_to_wstr
std::string wstr_to_str(const std::wstring &wstr, UINT cp = CP_ACP)
{
int len = WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(),
0, 0, 0, 0);
if (!len)
return "ErrorA";
std::vector<char> abuff(len + 1);
// NOTE: this does not NULL terminate the string in abuff, but this is ok
// since it was zero-initialized in the vector constructor
if (!WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(),
&abuff[0], len, 0, 0))
{
return "ErrorA";
}//if
return &abuff[0];
}//wstr_to_str
I am using Flash for my interface, or Flex to be specific, and i had a string in Flash containing an accented e character: Calle Jaén-20070505.jpg. I use a technology call screanweaver to transfer this string to my c++ dll. But when i look at the string in my dll, instead of the accented e character, i get the two characters é instead! I have no control over this screanweaver technology, and they use narrow chars. If i use the two functions to convert to unicode and back again, the accent character appears! Shamefully i don't know why, so if anyone can enlighten me to what's going on i would appreciate this.
é is "Latin Small Letter E With Acute".
Its Unicode code point is U+00E9.
The UTF8 encoding of that character is 0xC3,0xA9 (2 bytes).
If you look at code page 1252 (which is the default Ansi code page on my machine) and look up the glyph assigned to 0xC3 and 0xA9, you will see à and ©.
>> If i use the two functions to convert to unicode and back again, the accent character appears!
So utf8_to_wstr() is Unicode to Unicode. It just changes the encoding from UTF8 to UTF16LE. wstr_to_str() is Unicode (UTF16LE) to Ansi using the given code page. So the round trip essential does "\xC3\xA9" --> L"\x00E9" --> "\xE9". (0xE9 is the Ansi character code for U+00E9 under code page 1252).
gg
thanks i understand better now! So the way i have managed is actually quite a good way to convert utf-8 to ansi then?
It's a correct way in Windows - yes.
Just keep in mind that if you encounter a Unicode character that doesn't map to the same ACP (ansi code page) character, then you won't have the same "string". Your example looks like a file name - so the worst case scenario would be "file not found".
gg