|
-
December 5th, 2008, 06:19 AM
#1
convert string containing escape sequences to normal string
i have a char* containing this: C%3A%2F0
i would like to transform this to another variable which contains the string C:/0
is there an easy way to do this?
With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead. -- RFC 1925
-
December 5th, 2008, 06:34 AM
#2
Re: convert string containing escape sequences to normal string
If you used CString you could use Replace to replace %3A with : ,etc. In a char*, you have to do these things manually.
-
December 5th, 2008, 06:43 AM
#3
Re: convert string containing escape sequences to normal string
thanks for the reply, i will clarify my situation:
i am sending a unicode string from flash to my c++ dll, but the method of transportation only supports narrow characters, so for example if i send the character é across, it appears in my char* as é, i.e. two characters i assume because unicode takes twice the space. I tried to convert my char* to wstring using thw following function but it doesn't work, so string just is the same as it was before, i.e. é instead of what i want, which is é...
Code:
std::wstring str_to_wstr( const std::string& str )
{
std::wstring wstr( str.length()+1, 0 );
MultiByteToWideChar( CP_ACP,
0,
str.c_str(),
str.length(),
&wstr[0],
str.length() );
return wstr;
}
With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead. -- RFC 1925
-
December 5th, 2008, 07:04 AM
#4
Re: convert string containing escape sequences to normal string
sorry for wasting your time, i am an ignorant dumbass - my function works if the first param is CP_UTF8!
With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead. -- RFC 1925
-
December 5th, 2008, 07:18 AM
#5
Re: convert string containing escape sequences to normal string
-
December 5th, 2008, 08:56 AM
#6
Re: convert string containing escape sequences to normal string
>> &wstr[0],
This is non-standard and will result in undefined behavior. You need to use a vector for this.
Code:
std::wstring utf8_to_wstr(const std::string &utf8)
{
// NOTE: we are assuming that the number of bytes in a UTF8 string is
// always >= to the number of wchar_t's required to represent that
// string in UTF16LE - which should hold true
std::vector<wchar_t> wbuff(utf8.length() + 1);
if (!MultiByteToWideChar(CP_UTF8, 0,
utf8.c_str(), utf8.length(),
&wbuff[0], utf8.length() + 1))
{
return L"Error";
}//if
return &wbuff[0];
}//utf8_to_wstr
gg
-
December 8th, 2008, 04:31 AM
#7
Re: convert string containing escape sequences to normal string
Thanks Codeplug, i used the s****etded function and i did not work, then i used yours and i did work. One small thing i noticed is it returns "Error" for zero-length string, so i just stuck in
Code:
if(utf8 == "") return L"";
Thanks
With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead. -- RFC 1925
-
December 8th, 2008, 04:32 AM
#8
Re: convert string containing escape sequences to normal string
Hi Guys,
After your helpful comments i came up with a system which i think works. I convert to wstring, then back to string! My code is below. Is there a better way to do this?
Code:
std::wstring utf8_to_wstr(const std::string &utf8)
{
if(utf8 == "")
return L"";
std::vector<wchar_t> wbuff(utf8.length() + 1);
if (!MultiByteToWideChar(CP_UTF8, 0,
utf8.c_str(), utf8.length(),
&wbuff[0], utf8.length() + 1))
{
DWORD e = ::GetLastError();
switch(e)
{
case ERROR_INSUFFICIENT_BUFFER:
case ERROR_INVALID_FLAGS:
case ERROR_INVALID_PARAMETER:
case ERROR_NO_UNICODE_TRANSLATION:
return L"Error";
default:
break;
}
}
return &wbuff[0];
}
std::string wstr_to_str( const std::wstring& wstr )
{
if(wstr == L"")
return "";
std::vector<char> buff(wstr.length() + 1);
size_t size = wstr.length();
//std::string str( size + 1, 0 );
WideCharToMultiByte( CP_ACP,
0,
wstr.c_str(),
size,
&buff[0],
size+1,
NULL,
NULL );
return &buff[0];
}
string& cleanString(string& str)
{
wstring ws = utf8_to_wstr(str);
str = wstr_to_str(ws);
return str;
}
Last edited by dave2k; December 8th, 2008 at 05:01 AM.
With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead. -- RFC 1925
-
December 8th, 2008, 06:25 AM
#9
Re: convert string containing escape sequences to normal string
Why don't you let MultiByteToWideChar tell you how many characters it needs? Did you look at the FAQ I was pointing?
-
December 8th, 2008, 10:02 AM
#10
Re: convert string containing escape sequences to normal string
yes, i tried to use this function, and the string looked Ok in the variable window, but i got some errors when doing operations with this new string. The errors where strange to say the least. I used Codeplugs function and everything works fine. Unforunately i have no time to spare to vigorously go investigate further...
With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead. -- RFC 1925
-
December 8th, 2008, 12:08 PM
#11
Re: convert string containing escape sequences to normal string
Why are you trying to convert a UTF8 string into a ANSI/ASCII string? Any Unicode characters outside the current ANSI code page will be replaced with a "default" character - usually '?' or '_'.
Here are updated functions with updated comments:
Code:
std::wstring utf8_to_wstr(const std::string &utf8)
{
// NOTE: we are assuming that the number of bytes in a UTF8 string is
// always >= to the number of wchar_t's required to represent that
// string in UTF16LE - which should hold true
std::vector<wchar_t> wbuff(utf8.length() + 1);
// NOTE: this does not NULL terminate the string in wbuff, but this is ok
// since it was zero-initialized in the vector constructor
if (!MultiByteToWideChar(CP_UTF8, 0, utf8.c_str(), utf8.length(),
&wbuff[0], utf8.length()))
{
return L"ErrorW";
}//if
return &wbuff[0];
}//utf8_to_wstr
std::string wstr_to_str(const std::wstring &wstr, UINT cp = CP_ACP)
{
int len = WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(),
0, 0, 0, 0);
if (!len)
return "ErrorA";
std::vector<char> abuff(len + 1);
// NOTE: this does not NULL terminate the string in abuff, but this is ok
// since it was zero-initialized in the vector constructor
if (!WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(),
&abuff[0], len, 0, 0))
{
return "ErrorA";
}//if
return &abuff[0];
}//wstr_to_str
gg
-
December 9th, 2008, 05:31 AM
#12
Re: convert string containing escape sequences to normal string
I am using Flash for my interface, or Flex to be specific, and i had a string in Flash containing an accented e character: Calle Jaén-20070505.jpg. I use a technology call screanweaver to transfer this string to my c++ dll. But when i look at the string in my dll, instead of the accented e character, i get the two characters é instead! I have no control over this screanweaver technology, and they use narrow chars. If i use the two functions to convert to unicode and back again, the accent character appears! Shamefully i don't know why, so if anyone can enlighten me to what's going on i would appreciate this.
With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead. -- RFC 1925
-
December 9th, 2008, 11:43 AM
#13
Re: convert string containing escape sequences to normal string
é is "Latin Small Letter E With Acute".
Its Unicode code point is U+00E9.
The UTF8 encoding of that character is 0xC3,0xA9 (2 bytes).
If you look at code page 1252 (which is the default Ansi code page on my machine) and look up the glyph assigned to 0xC3 and 0xA9, you will see à and ©.
>> If i use the two functions to convert to unicode and back again, the accent character appears!
So utf8_to_wstr() is Unicode to Unicode. It just changes the encoding from UTF8 to UTF16LE. wstr_to_str() is Unicode (UTF16LE) to Ansi using the given code page. So the round trip essential does "\xC3\xA9" --> L"\x00E9" --> "\xE9". (0xE9 is the Ansi character code for U+00E9 under code page 1252).
gg
-
December 9th, 2008, 12:36 PM
#14
Re: convert string containing escape sequences to normal string
thanks i understand better now! So the way i have managed is actually quite a good way to convert utf-8 to ansi then?
With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead. -- RFC 1925
-
December 9th, 2008, 12:58 PM
#15
Re: convert string containing escape sequences to normal string
It's a correct way in Windows - yes.
Just keep in mind that if you encounter a Unicode character that doesn't map to the same ACP (ansi code page) character, then you won't have the same "string". Your example looks like a file name - so the worst case scenario would be "file not found".
gg
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|