Click to See Complete Forum and Search --> : String conversion


PredicateNormative
May 6th, 2008, 11:35 AM
Ok, I wanted I nice way to convert a string to lower case, so I thought to myself "Why reinvent the wheel? A quick google search will surely reveal something useful.", so I had a quick look and was truly alarmed to see what most people seem to be doing and suggesting to others. It is basically this:


void StringToLower(std::string& str)
{
std::transform(str.begin(), str.end(), str.begin(), tolower);
}


This looks bad to me, really bad! To my understanding it is attempting to overwrite the underlying const data buffer of the string in place! The version that I have implemented is as follows:


std::string StringToLower(std::string& str)
{
std::vector<char> vec(str.begin(), str.end());
std::transform(vec.begin(), vec.end(), vec.begin(), tolower);
return std::string(vec.begin(),vec.end());
}

but this is grossly inefficient. Do any of you know I nice way to convert a string to lower case?

jlou
May 6th, 2008, 11:50 AM
Why do you think it is changing the underlying data buffer? That is only bad when you expose that buffer with c_str() and make changes directly to it. Here, tolower is called on each character in the string through the string's interface, so it is not a problem. You can use your first example.

PredicateNormative
May 6th, 2008, 12:07 PM
Why do you think it is changing the underlying data buffer? That is only bad when you expose that buffer with c_str() and make changes directly to it. Here, tolower is called on each character in the string through the string's interface, so it is not a problem. You can use your first example.
I'm not convinced. The inner workings of transform under VS2005 are:

for (; _First != _Last; ++_First, ++_Dest)
*_Dest = _Func(*_First);

_Dest is the string iterator, which, if you go through the header code, either ends up as a char* or a char[] depending on the circumstances of the string. What this means is that transform is directly acting on the underlying buffer of the string, which seems wrong semantically if nothing else.

GCDEF
May 6th, 2008, 12:16 PM
I'm not understanding why that's a problem.

jlou
May 6th, 2008, 12:19 PM
I'm not convinced.As long as you're accessing through the iterator you're ok. Of course it will end up being a char* and of course the underlying buffer of the string is changed, you are changing the string. The only problem you have is when you change the underlying buffer outside of the string's interface. Here you are doing it through iterators, which are part of the interface and therefore safe.

The string class has an operator[] as part of it's interface. Do you also believe this is unsafe:std::string greeting("Hello World");
greeting[0] = "J";What about this:greeting = "Goodbye";Each of these things is just changing the string through its interface, and all are perfectly safe and acceptable.

exterminator
May 6th, 2008, 12:31 PM
If you are going to use the second form, which might be necessary at times the input string is a const object or a literal, then make the argument be a reference to a *const* object.

Both forms are acceptable.

By the way, why do you use a vector<char> and then make out a std::string out of it? Why not use std::string to begin with?

kempofighter
May 6th, 2008, 04:40 PM
"This looks bad to me, really bad! To my understanding it is attempting to overwrite the underlying const data buffer of the string in place!"

I believe a non-const iterator is being used to change the data. The purpose of this operation is to change every character of the string. They each have to be modified somehow. The std:string owns it's own data buffer so why is it a problem that the std::string instance is able to change it's own data through an iterator?

I feel your pain though. When I first started writing code that uses the std::string class, it seemed peculiar that there was no tolower member function. I thought that was rather odd but eventually I just got used to it. although it works, it would have been nice if there was just a member function to do this for std::string. I think that would make code a bit easier to read and would just seem more intuitive to me; but I digress.

Paul McKenzie
May 6th, 2008, 06:37 PM
it seemed peculiar that there was no tolower member function.That may be because in the world of spoken languages, the definition of "lower case" may differ from what English and similar languages believe it should be, and I guess the standards committee didn't want to get involved in creating a function to work for every locale.

Regards,

Paul McKenzie

Paul McKenzie
May 6th, 2008, 06:50 PM
Ok, I wanted I nice way to convert a string to lower case, so I thought to myself "Why reinvent the wheel? A quick google search will surely reveal something useful.", so I had a quick look and was truly alarmed to see what most people seem to be doing and suggesting to others. It is basically this:


void StringToLower(std::string& str)
{
std::transform(str.begin(), str.end(), str.begin(), tolower);
}


This looks bad to me, really bad! To my understanding it is attempting to overwrite the underlying const data buffer of the string in place!Now that others have pointed out it doesn't do what you say it does, the only thing "bad" about it is that it will not work for various locales and languages. German (I believe) is the one where this method breaks down.

Regards,

Paul McKenzie

spoon!
May 6th, 2008, 07:21 PM
i believe you can change the locale and it will work

exterminator
May 6th, 2008, 10:24 PM
The problem is locales are not standardized. You will find different compilers supporting different set of locales and locale names. However, if you take some care it is not tough to implement the right algorithm for your compiler. Note - an algorithm - because it in no way really needs to be a member function of std::string. Here is a post that I had made around this, few weeks back - http://learningcppisfun.blogspot.com/2008/04/case-insensitive-string-comparison.html

PredicateNormative
May 7th, 2008, 04:55 AM
Ok, technically there is nothing wrong with it but I just don't like the idea of modifying the string data in that way, it just seems wrong to me. However, since everyone disagrees with me, I think I should bow to the majority ruling and adjust my ideas with regard to the wrongness of the operation. :) Thanks all.

PredicateNormative
May 7th, 2008, 10:34 AM
I ended up implementing the following:


/** Convert a string of type T (char*, char[] or std::string) to lower case
* @param str T string to be converted.
* @note Usage: std::string var = StringToLower(str);
*/
template <typename T> std::string StringToLower(T& str)
{
std::string tmp = str;
std::transform(tmp.begin(), tmp.end(), tmp.begin(), tolower);
return tmp;
}


Any disagreements/improvements/comments are welcome.

Hermit
May 7th, 2008, 11:09 AM
I ended up implementing the following:


/** Convert a string of type T (char*, char[] or std::string) to lower case
* @param str T string to be converted.
* @note Usage: std::string var = StringToLower(str);
*/
template <typename T> std::string StringToLower(T& str)
{
std::string tmp = str;
std::transform(tmp.begin(), tmp.end(), tmp.begin(), tolower);
return tmp;
}


Any disagreements/improvements/comments are welcome.
I fail to see the advantage of this, even if you feel it is erroneous to modify a string through its iterators (which it isn't). You're still modifying tmp in the same way you would have modified str, and str is being passed by non-const reference for no reason.

If you really don't want to modify the string through its iterators, for some odd reason or another, you could do:

std::string StringToLower(const std::string & str)
{
std::string tmp;
std::transform(str.begin(), str.end(), std::back_insert_iterator<std::string>(tmp), tolower);
return tmp;
}

PredicateNormative
May 7th, 2008, 11:39 AM
Hi Hermit, my mistake, the interface should have been:


template <typename T> std::string StringToLower(const T& str);


I've conceded on the correctness of modifying the string through its iterators, I just didn't want the input string modified because it would impact other code. Thanks for the suggestion. :)

kempofighter
May 8th, 2008, 08:50 PM
Hi Hermit, my mistake, the interface should have been:


template <typename T> std::string StringToLower(const T& str);


I've conceded on the correctness of modifying the string through its iterators, I just didn't want the input string modified because it would impact other code. Thanks for the suggestion. :)

What if I call the function like this?

double x(1.5);
std::string lowerCase = StringToLower(x);


What happens then? In other words, why do you think that the input should be a template parameter and not a std::string? This interface that you posted seems nonsensical to me and would allow someone to call the function with any type. Is that really what you intend?

PredicateNormative
May 9th, 2008, 06:06 AM
What if I call the function like this?

double x(1.5);
std::string lowerCase = StringToLower(x);


What happens then? In other words, why do you think that the input should be a template parameter and not a std::string? This interface that you posted seems nonsensical to me and would allow someone to call the function with any type. Is that really what you intend?

That will not compile since a double cannot be assigned to a string, so there is no problem. ;) The point of the template is to allow the function to be called with an input of type std::string, char* or char[].

Mitsukai
May 9th, 2008, 06:40 AM
But inside the function you are converting str to a string, so you might as well make the parameter a non-referenced string because it will do the same thing.

std::string StringToLower(std::string tmp)
{
std::transform(tmp.begin(), tmp.end(), tmp.begin(), tolower);
return tmp;
}

PredicateNormative
May 9th, 2008, 06:53 AM
That's true enough. :)