|
-
July 7th, 2012, 06:43 PM
#1
More readonly/const stuff: std::string vs const char
I have a source having a bunch of c-style strings hardcoded as const char, for each of those strings there's also a length integer in the code.
At it's simplest (it's a bit more complex in reality, but this works fine as example):
Code:
const char hello[] = "hello";
int hello_length = 5;
const char foo[] = "foo";
int foo_length = 5;
const char bar[] = "bar";
int bar_length = 5;
There's thousands of those strings, and they can be several hundreds or even thousand characters long.
I have functions that return one of the above...
Code:
const char* getstring(const type& which_one_do_you_want, int& len)
{
// ... code to decide which one to use
len = hello_length;
return hello;
}
if I change this to return a std::string instead of a const char*...
Code:
std::string getstring(const type& which_one_do_you_want, int& len)
{
// ... code to decide which one to use
return std::string(hello, hello_length);
}
Is there a way to prevent the string from doing an allocation+copy and instead make the string point to the const string ?
possibly by "messing around" with the allocator template parameter to std::string ?
Or is there little point to returning a string in this case.
-
July 7th, 2012, 11:53 PM
#2
Re: More readonly/const stuff: std::string vs const char
Wouldn't it be simpler to make hello, foo, bar, etc const std::string objects, then return a const reference to one of them from the function? Plus, you wouldn't need separate hello_length, foo_length, etc variables since each string keeps track of its own length.
-
July 8th, 2012, 02:34 AM
#3
Re: More readonly/const stuff: std::string vs const char
 Originally Posted by OReubens
Is there a way to prevent the string from doing an allocation+copy and instead make the string point to the const string ?
possibly by "messing around" with the allocator template parameter to std::string ?
if by "messing around" you mean writing implementation-specific ready-to-explode code then yes, it should be doable ( essentially, you need to track down the exact allocation startegy of that specific std::string implementation and react accordingly; generally, note that allocators don't know neither what nor when they're going to allocate something ... ). Not advisable though.
-
July 8th, 2012, 04:13 AM
#4
Re: More readonly/const stuff: std::string vs const char
If you feel that this really has to be modified I agree with Laserlight's suggestion.
On the other hand what's so bad with keeping the code as it is? As long as it's guaranteed that no invalid pointer is ever returned I wouldn't change anything. Since it's quite error prone I assume that the length isn't hardcoded like that in the real code?
-
July 8th, 2012, 07:19 AM
#5
Re: More readonly/const stuff: std::string vs const char
Toying with the allocator would only defer the problem to later, once you are handling strings with incompatible allocator types.
As laserlight said, why not just declare your objects as std::string, and return by const reference? Unless you have people accessing the variables directly, it should even be doable without breaking any existing code. This seems like the very best solution. Did you benchmark to confirm this was an actual performance issue?
If anything, I'd say there is a useability issue: I'd provide that "getstring" function of yours without the len paramater. Then the user can write this directly:
Code:
std::string my_string = getstring(hello_type); //Auto conversion from char* to std::string
Yes, you'd pay "even more" since you'd have to strlen, or reallocate during construction (depending on implementation), but:
a) It is much easier on the coder to not have to declare a "len" variable, and if there is one thing I've lerned, it's that readability/maintainability trumps performance.
a.a) This holds even if the user wants to manipulate a const char*. If the user doesn't care about the length (for example, a printf), then he won't have to create a dummy variable.
b) Most string implementations allocate their first buffer at about 32 bytes, so there probably won't even be a change in performance.
Is your question related to IO?
Read this C++ FAQ article at parashift by Marshall Cline. In particular points 1-6.
It will explain how to correctly deal with IO, how to validate input, and why you shouldn't count on "while(!in.eof())". And it always makes for excellent reading.
-
July 8th, 2012, 07:41 AM
#6
Re: More readonly/const stuff: std::string vs const char
 Originally Posted by laserlight
Wouldn't it be simpler to make hello, foo, bar, etc const std::string objects, then return a const reference to one of them from the function? Plus, you wouldn't need separate hello_length, foo_length, etc variables since each string keeps track of its own length.
There is over 50Mb of string resources this way. As it is now it all resides in a const/reaonly section in the Exe image. By how it works, the pages of where this data resides is not "loaded" when the exe starts, it is instead paged in when needed.
if I were to make all of them std::strings.
1) My code would STILL contain 50mb of static const char data in some section decided by the compiler/linker.
2) additionally I would have a couple thousands of std::string instances, allocating something over 50mb of data on the heap.
it would also increase the start time of my exe as it would need to create/allocate all this memory and copy 50Mb worth of string data on the heap.
not really an acceptable scenario.
THe problem is that I do have a LOT of access to the strings, so the continuous allocation/copy/deallocation is taking a significant portion of the running time. would be awesome if I could avoid this.
-
July 8th, 2012, 07:42 AM
#7
Re: More readonly/const stuff: std::string vs const char
 Originally Posted by monarch_dodra
Toying with the allocator would only defer the problem to later, once you are handling strings with incompatible allocator types.
care to elaborate ?
I'm still not quite sure how these allocators do their thing.
As laserlight said, why not just declare your objects as std::string, and return by const reference? Unless you have people accessing the variables directly, it should even be doable without breaking any existing code. This seems like the very best solution. Did you benchmark to confirm this was an actual performance issue?
It very much is. it takes nearly 2 minutes to construct all the data as std::string.
And it increases the memory usage of my exe with over 60Mb. This has an effect of subsequent runtime by needing additional paging.
Code:
std::string my_string = getstring(hello_type); //Auto conversion from char* to std::string
true. but on strings of 10K length or more, this ends up eating quite a bit of additional runtime to do a strlen() each time.
I was "sort of" hoping std::string would resolve this by allowing me an easy way to combine both contents and length.
And it works. I can make a
Code:
std::string getstring() const { return std::string(hello, hello_length); }
This saves the strlen() time out of it, but it still does allocation and copying the 10K string (over and over).
a) It is much easier on the coder to not have to declare a "len" variable, and if there is one thing I've lerned, it's that readability/maintainability trumps performance.
the 'coders' have no work on it, see above, length is derrived by code.
The actual strings are in a .cpp that is generated by another program in the build stages. Nobody manually maintains this source. THis is also why it was easy to change/test it to have strings instead of const char's.
b) Most string implementations allocate their first buffer at about 32 bytes, so there probably won't even be a change in performance.
it still does a copy. which has a noticable effect on the runtime performance.
Last edited by OReubens; July 8th, 2012 at 08:02 AM.
-
July 8th, 2012, 07:49 AM
#8
Re: More readonly/const stuff: std::string vs const char
 Originally Posted by S_M_A
Since it's quite error prone I assume that the length isn't hardcoded like that in the real code?
THe above was a simplified example of what I want to achieve.
In reality the strings and lengths are all encoded into one huge chunk of memory (a static const char []) if you will. The start of the string and the length are determined by the getstring() function. the length of the string is not determined by doing a strlen() of the start of the string, the length is determined by getting the start of the next string and doing a difference between the two minus 1 for the terminator.
-
July 8th, 2012, 09:51 AM
#9
Re: More readonly/const stuff: std::string vs const char
 Originally Posted by OReubens
THe problem is that I do have a LOT of access to the strings, so the continuous allocation/copy/deallocation is taking a significant portion of the running time. would be awesome if I could avoid this.
In a typical run of your application, how many different strings are actually used? If the same set of strings are repeatedly being requested, then the obvious solution would be to build some sort of map of ints to string pointers.
Code:
typedef std::map<int, std::string*> StringMap;
StringMap theStrings;
//...
std::string& getString(int theStringIWant)
{
StringMap::iterator it = theStrings.find( theStringIWant );
if ( theStrings.find(theStringIWant) == theStrings.end() )
{
std::string *pS = new std::string(whatever);
theStrings.insert(make_pair(theStringIWant, pS));
return *pS;
}
else
return *it->second;
}
A reference is returned, not an object, so the only copying and allocation is done if the string has not yet been requested. Since the map is keyed on ints, the lookup time will be negligible. Of course, you have to deallocate the string data that has been built into the map at the end of the program.
So if you're requesting the same strings over and over again, maybe something like this could be useful.
Regards,
Paul McKenzie
-
July 8th, 2012, 11:29 AM
#10
Re: More readonly/const stuff: std::string vs const char
 Originally Posted by Paul McKenzie
So if you're requesting the same strings over and over again, maybe something like this could be useful.
about 90% of the strings will be accessed in a typical run of the application. the remaining 10% are exceptional.
of the 90%, about a third of them (which third depends on circumstances I can't predict) will be accessed "a lot" with the remaining 2/3 being accessed "a lot less". Ballpark number is a ratio of 200 to 1. on number of accesses.
if circumstances change so does the amount of accesses on each string relative to the others.
Even with all that... and even if circumstances didn't change and you could somehow make an ideal "recently used" caching system, I cannot afford to have 30% of all the strings permanently allocated and using 35Mb or so of ram. the majority of the strings are "just" outside of the strings's internal buffer, so all those objects are adding up. When I did my earlier 50Mb calculation, I didn't even count this in (wasn't aware of it), the actual size in string objects (internal buffer + pointer + length + maxsize + allocator pointer, ...) + the memory they point to adds up to just over 90Mb (for around 50Mb of actual const char strings).
35Mb may seem like peanuts. But not every computer out there has 64bits of addressing space and gigabytes of RAM installed. And even if it does, you can't assume every program on the machine has that big of a sandbox it is allowed to play in.
right now, about 30% of the entire runtime of the application is in string allocation/copying/deallocation of static const char's. I already managed to take it from 40% to this 30% by using the string constructor with a length instead of the one without.
(yeh, I know, I'm not the typical average programmer out there )
-
July 8th, 2012, 11:46 AM
#11
Re: More readonly/const stuff: std::string vs const char
 Originally Posted by OReubens
35Mb may seem like peanuts. But not every computer out there has 64bits of addressing space and gigabytes of RAM installed.
I don't see where the "64 bits of addressing" comes into play when allocating 35Mb.
And even if it does, you can't assume every program on the machine has that big of a sandbox it is allowed to play in.
So what are your program's minimum requirements? Maybe you should consider increasing whatever minimum requirements you have (if you have these requirements) and state these requirements in black and white, as most software shops do. Then you get no surprises if someone's system doesn't have the horsepower.
But quite honestly, 35 MB of allocated data should be able to be handled by most, if not all, modern system.
Regards,
Paul McKenzie
-
July 8th, 2012, 12:06 PM
#12
Re: More readonly/const stuff: std::string vs const char
 Originally Posted by Paul McKenzie
I don't see where the "64 bits of addressing" comes into play when allocating 35Mb.
Just a casual observation.
Most people don't seem to have any worries at all if even very simple programs use several hundreds Mb of ram. Optimizing memory usage seems to be the least of a lot of programmers worries. Many programmers have become so accustomed to having 64bit addressing and multi gigabytes of ram to play with.
<tinfoil hat>It's a conspiracy of the RAM chip manufacturers I tellz ya!!!</tinfoil hat>
So what are your program's minimum requirements? Maybe you should consider increasing whatever minimum requirements you have (if you have these requirements) and state these requirements in black and white, as most software shops do. Then you get no surprises if someone's system doesn't have the horsepower.
I cannot increase them by that amount.
It gets even worse as we are expecting the need to internationalize "soon" and needing to change all the strings to wchar_t doubling memory needs if we would store all the strings permanently.
The app also needs memory for other stuff. The stringhandling is simply the current bottleneck in runtime performance.
But quite honestly, 35 MB of allocated data should be able to be handled by most, if not all, modern system.
<tinfoil hat>OMG you're one of them !!!</tinfoil hat> 
The 35Mb is the case if I could somehow "optimize" the cache to only holding the 30% most used strings and temp'ing the remaining lower usage 70% and adjust this cache as needed. I see no realistic way of doing this.
If I store 'em all I'm on 90Mb. With the expected internationalization. This turns to 160Mb. We also have some extra features planned which will increase this amount even further.
If this were the only app running on the server... yes... maybe... Even then it's still a bad excuse to have a 2Minute startup time and wasting 90Mb duplicating data that is already present elsewhere in the exe image.
Storing all the const char's as strings all the time is simply not an option.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|