convert string containing escape sequences to normal string

**dave2k** · December 5th, 2008, 06:19 AM

i have a char* containing this: C%3A%2F0
i would like to transform this to another variable which contains the string C:/0

is there an easy way to do this?

**cilu** · December 5th, 2008, 06:34 AM

If you used CString you could use Replace to replace %3A with : ,etc. In a char*, you have to do these things manually.

**dave2k** · December 5th, 2008, 06:43 AM

thanks for the reply, i will clarify my situation:

i am sending a unicode string from flash to my c++ dll, but the method of transportation only supports narrow characters, so for example if i send the character é across, it appears in my char* as Ã©, i.e. two characters i assume because unicode takes twice the space. I tried to convert my char* to wstring using thw following function but it doesn't work, so string just is the same as it was before, i.e. Ã© instead of what i want, which is é...

Code:

std::wstring str_to_wstr( const std::string& str )
{
	std::wstring wstr( str.length()+1, 0 );

	MultiByteToWideChar( CP_ACP,
		0,
		str.c_str(),
		str.length(),
		&wstr[0],
		str.length() );
	return wstr;
}

**dave2k** · December 5th, 2008, 07:04 AM

sorry for wasting your time, i am an ignorant dumbass - my function works if the first param is CP_UTF8!

**cilu** · December 5th, 2008, 07:18 AM

Right.

BTW, there is an FAQ about this: http://www.codeguru.com/forum/showthread.php?t=231165.

**Codeplug** · December 5th, 2008, 08:56 AM

>> &wstr[0],
This is non-standard and will result in undefined behavior. You need to use a vector for this.

Code:

std::wstring utf8_to_wstr(const std::string &utf8)
{
    // NOTE: we are assuming that the number of bytes in a UTF8 string is 
    //       always >= to the number of wchar_t's required to represent that 
    //       string in UTF16LE - which should hold true
    std::vector<wchar_t> wbuff(utf8.length() + 1);

    if (!MultiByteToWideChar(CP_UTF8, 0, 
                             utf8.c_str(), utf8.length(),
                             &wbuff[0], utf8.length() + 1))
    {
        return L"Error";
    }//if

    return &wbuff[0];
}//utf8_to_wstr

gg

**dave2k** · December 8th, 2008, 04:31 AM

Thanks Codeplug, i used the s****etded function and i did not work, then i used yours and i did work. One small thing i noticed is it returns "Error" for zero-length string, so i just stuck in

Code:

if(utf8 == "") return L"";

Thanks

**dave2k** · December 8th, 2008, 04:32 AM

Hi Guys,

After your helpful comments i came up with a system which i think works. I convert to wstring, then back to string! My code is below. Is there a better way to do this?

Code:

std::wstring utf8_to_wstr(const std::string &utf8)
{
	if(utf8 == "")
		return L"";

	std::vector<wchar_t> wbuff(utf8.length() + 1);

	if (!MultiByteToWideChar(CP_UTF8, 0, 
		utf8.c_str(), utf8.length(),
		&wbuff[0], utf8.length() + 1))
	{
		DWORD e = ::GetLastError();
		switch(e)
		{
			case ERROR_INSUFFICIENT_BUFFER:
			case ERROR_INVALID_FLAGS:
			case ERROR_INVALID_PARAMETER:
			case ERROR_NO_UNICODE_TRANSLATION:
				return L"Error";
			default:
				break;
		}
	}

	return &wbuff[0];
}

std::string wstr_to_str( const std::wstring& wstr )
{
	if(wstr == L"")
		return "";

	std::vector<char> buff(wstr.length() + 1);

	size_t size = wstr.length();
	//std::string str( size + 1, 0 );

	WideCharToMultiByte( CP_ACP,
		0,
		wstr.c_str(),
		size,
		&buff[0],
		size+1,
		NULL,
		NULL );
	return &buff[0];
}


string& cleanString(string& str)
{
	wstring ws = utf8_to_wstr(str);
	str = wstr_to_str(ws);
	return str;
}

**cilu** · December 8th, 2008, 06:25 AM

Why don't you let MultiByteToWideChar tell you how many characters it needs? Did you look at the FAQ I was pointing?

**dave2k** · December 8th, 2008, 10:02 AM

yes, i tried to use this function, and the string looked Ok in the variable window, but i got some errors when doing operations with this new string. The errors where strange to say the least. I used Codeplugs function and everything works fine. Unforunately i have no time to spare to vigorously go investigate further...

**Codeplug** · December 8th, 2008, 12:08 PM

Why are you trying to convert a UTF8 string into a ANSI/ASCII string? Any Unicode characters outside the current ANSI code page will be replaced with a "default" character - usually '?' or '_'.

Here are updated functions with updated comments:

Code:

std::wstring utf8_to_wstr(const std::string &utf8)
{
    // NOTE: we are assuming that the number of bytes in a UTF8 string is 
    //       always >= to the number of wchar_t's required to represent that 
    //       string in UTF16LE - which should hold true
    std::vector<wchar_t> wbuff(utf8.length() + 1);

    // NOTE: this does not NULL terminate the string in wbuff, but this is ok
    //       since it was zero-initialized in the vector constructor
    if (!MultiByteToWideChar(CP_UTF8, 0, utf8.c_str(), utf8.length(),
                             &wbuff[0], utf8.length()))
    {
        return L"ErrorW";
    }//if

    return &wbuff[0];
}//utf8_to_wstr

std::string wstr_to_str(const std::wstring &wstr, UINT cp = CP_ACP)
{
    int len = WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(), 
                                  0, 0, 0, 0);
    if (!len)
        return "ErrorA";

    std::vector<char> abuff(len + 1);

    // NOTE: this does not NULL terminate the string in abuff, but this is ok
    //       since it was zero-initialized in the vector constructor
    if (!WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(), 
                             &abuff[0], len, 0, 0))
    {
        return "ErrorA";
    }//if

    return &abuff[0];
}//wstr_to_str

gg

**dave2k** · December 9th, 2008, 05:31 AM

I am using Flash for my interface, or Flex to be specific, and i had a string in Flash containing an accented e character: Calle Jaén-20070505.jpg. I use a technology call screanweaver to transfer this string to my c++ dll. But when i look at the string in my dll, instead of the accented e character, i get the two characters Ã© instead! I have no control over this screanweaver technology, and they use narrow chars. If i use the two functions to convert to unicode and back again, the accent character appears! Shamefully i don't know why, so if anyone can enlighten me to what's going on i would appreciate this.

**Codeplug** · December 9th, 2008, 11:43 AM

é is "Latin Small Letter E With Acute".
Its Unicode code point is U+00E9.
The UTF8 encoding of that character is 0xC3,0xA9 (2 bytes).
If you look at code page 1252 (which is the default Ansi code page on my machine) and look up the glyph assigned to 0xC3 and 0xA9, you will see Ã and ©.

>> If i use the two functions to convert to unicode and back again, the accent character appears!
So utf8_to_wstr() is Unicode to Unicode. It just changes the encoding from UTF8 to UTF16LE. wstr_to_str() is Unicode (UTF16LE) to Ansi using the given code page. So the round trip essential does "\xC3\xA9" --> L"\x00E9" --> "\xE9". (0xE9 is the Ansi character code for U+00E9 under code page 1252).

gg

**dave2k** · December 9th, 2008, 12:36 PM

thanks i understand better now! So the way i have managed is actually quite a good way to convert utf-8 to ansi then?

**Codeplug** · December 9th, 2008, 12:58 PM

It's a correct way in Windows - yes.

Just keep in mind that if you encounter a Unicode character that doesn't map to the same ACP (ansi code page) character, then you won't have the same "string". Your example looks like a file name - so the worst case scenario would be "file not found".

gg

Thread: convert string containing escape sequences to normal string

Thread Tools

Display

convert string containing escape sequences to normal string

Re: convert string containing escape sequences to normal string

Re: convert string containing escape sequences to normal string

Re: convert string containing escape sequences to normal string

Re: convert string containing escape sequences to normal string

Re: convert string containing escape sequences to normal string

Re: convert string containing escape sequences to normal string

Re: convert string containing escape sequences to normal string

Re: convert string containing escape sequences to normal string

Re: convert string containing escape sequences to normal string

Re: convert string containing escape sequences to normal string

Re: convert string containing escape sequences to normal string

Re: convert string containing escape sequences to normal string

Re: convert string containing escape sequences to normal string

Re: convert string containing escape sequences to normal string

Posting Permissions