CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 12 of 12
  1. #1
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,626

    More readonly/const stuff: std::string vs const char

    I have a source having a bunch of c-style strings hardcoded as const char, for each of those strings there's also a length integer in the code.

    At it's simplest (it's a bit more complex in reality, but this works fine as example):
    Code:
    const char hello[] = "hello";
    int hello_length = 5;
    
    const char foo[] = "foo";
    int foo_length = 5;
    
    const char bar[] = "bar";
    int bar_length = 5;
    There's thousands of those strings, and they can be several hundreds or even thousand characters long.


    I have functions that return one of the above...

    Code:
    const char* getstring(const type& which_one_do_you_want, int& len)
    {
    // ... code to decide which one to use
        len = hello_length;
        return hello;
    }
    if I change this to return a std::string instead of a const char*...
    Code:
    std::string getstring(const type& which_one_do_you_want, int& len)
    {
    // ... code to decide which one to use
        return std::string(hello, hello_length);
    }
    Is there a way to prevent the string from doing an allocation+copy and instead make the string point to the const string ?
    possibly by "messing around" with the allocator template parameter to std::string ?

    Or is there little point to returning a string in this case.

  2. #2
    Join Date
    Jan 2006
    Location
    Singapore
    Posts
    6,765

    Re: More readonly/const stuff: std::string vs const char

    Wouldn't it be simpler to make hello, foo, bar, etc const std::string objects, then return a const reference to one of them from the function? Plus, you wouldn't need separate hello_length, foo_length, etc variables since each string keeps track of its own length.
    C + C++ Compiler: MinGW port of GCC
    Build + Version Control System: SCons + Bazaar

    Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
    Kindly rate my posts if you found them useful

  3. #3
    Join Date
    Oct 2008
    Posts
    1,456

    Re: More readonly/const stuff: std::string vs const char

    Quote Originally Posted by OReubens View Post
    Is there a way to prevent the string from doing an allocation+copy and instead make the string point to the const string ?
    possibly by "messing around" with the allocator template parameter to std::string ?
    if by "messing around" you mean writing implementation-specific ready-to-explode code then yes, it should be doable ( essentially, you need to track down the exact allocation startegy of that specific std::string implementation and react accordingly; generally, note that allocators don't know neither what nor when they're going to allocate something ... ). Not advisable though.

  4. #4
    Join Date
    Oct 2006
    Location
    Sweden
    Posts
    3,654

    Re: More readonly/const stuff: std::string vs const char

    If you feel that this really has to be modified I agree with Laserlight's suggestion.

    On the other hand what's so bad with keeping the code as it is? As long as it's guaranteed that no invalid pointer is ever returned I wouldn't change anything. Since it's quite error prone I assume that the length isn't hardcoded like that in the real code?
    Debugging is twice as hard as writing the code in the first place.
    Therefore, if you write the code as cleverly as possible, you are, by
    definition, not smart enough to debug it.
    - Brian W. Kernighan

    To enhance your chance's of getting an answer be sure to read
    http://www.codeguru.com/forum/announ...nouncementid=6
    and http://www.codeguru.com/forum/showthread.php?t=366302 before posting

    Refresh your memory on formatting tags here
    http://www.codeguru.com/forum/misc.php?do=bbcode

    Get your free MS compiler here
    https://visualstudio.microsoft.com/vs

  5. #5
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: More readonly/const stuff: std::string vs const char

    Toying with the allocator would only defer the problem to later, once you are handling strings with incompatible allocator types.

    As laserlight said, why not just declare your objects as std::string, and return by const reference? Unless you have people accessing the variables directly, it should even be doable without breaking any existing code. This seems like the very best solution. Did you benchmark to confirm this was an actual performance issue?

    If anything, I'd say there is a useability issue: I'd provide that "getstring" function of yours without the len paramater. Then the user can write this directly:

    Code:
    std::string my_string = getstring(hello_type); //Auto conversion from char* to std::string
    Yes, you'd pay "even more" since you'd have to strlen, or reallocate during construction (depending on implementation), but:
    a) It is much easier on the coder to not have to declare a "len" variable, and if there is one thing I've lerned, it's that readability/maintainability trumps performance.
    a.a) This holds even if the user wants to manipulate a const char*. If the user doesn't care about the length (for example, a printf), then he won't have to create a dummy variable.
    b) Most string implementations allocate their first buffer at about 32 bytes, so there probably won't even be a change in performance.
    Is your question related to IO?
    Read this C++ FAQ article at parashift by Marshall Cline. In particular points 1-6.
    It will explain how to correctly deal with IO, how to validate input, and why you shouldn't count on "while(!in.eof())". And it always makes for excellent reading.

  6. #6
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,626

    Re: More readonly/const stuff: std::string vs const char

    Quote Originally Posted by laserlight View Post
    Wouldn't it be simpler to make hello, foo, bar, etc const std::string objects, then return a const reference to one of them from the function? Plus, you wouldn't need separate hello_length, foo_length, etc variables since each string keeps track of its own length.
    There is over 50Mb of string resources this way. As it is now it all resides in a const/reaonly section in the Exe image. By how it works, the pages of where this data resides is not "loaded" when the exe starts, it is instead paged in when needed.

    if I were to make all of them std::strings.

    1) My code would STILL contain 50mb of static const char data in some section decided by the compiler/linker.
    2) additionally I would have a couple thousands of std::string instances, allocating something over 50mb of data on the heap.
    it would also increase the start time of my exe as it would need to create/allocate all this memory and copy 50Mb worth of string data on the heap.

    not really an acceptable scenario.


    THe problem is that I do have a LOT of access to the strings, so the continuous allocation/copy/deallocation is taking a significant portion of the running time. would be awesome if I could avoid this.

  7. #7
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,626

    Re: More readonly/const stuff: std::string vs const char

    Quote Originally Posted by monarch_dodra View Post
    Toying with the allocator would only defer the problem to later, once you are handling strings with incompatible allocator types.
    care to elaborate ?
    I'm still not quite sure how these allocators do their thing.

    As laserlight said, why not just declare your objects as std::string, and return by const reference? Unless you have people accessing the variables directly, it should even be doable without breaking any existing code. This seems like the very best solution. Did you benchmark to confirm this was an actual performance issue?
    It very much is. it takes nearly 2 minutes to construct all the data as std::string.
    And it increases the memory usage of my exe with over 60Mb. This has an effect of subsequent runtime by needing additional paging.


    Code:
    std::string my_string = getstring(hello_type); //Auto conversion from char* to std::string
    true. but on strings of 10K length or more, this ends up eating quite a bit of additional runtime to do a strlen() each time.

    I was "sort of" hoping std::string would resolve this by allowing me an easy way to combine both contents and length.
    And it works. I can make a
    Code:
    std::string getstring() const { return std::string(hello, hello_length); }
    This saves the strlen() time out of it, but it still does allocation and copying the 10K string (over and over).

    a) It is much easier on the coder to not have to declare a "len" variable, and if there is one thing I've lerned, it's that readability/maintainability trumps performance.
    the 'coders' have no work on it, see above, length is derrived by code.
    The actual strings are in a .cpp that is generated by another program in the build stages. Nobody manually maintains this source. THis is also why it was easy to change/test it to have strings instead of const char's.


    b) Most string implementations allocate their first buffer at about 32 bytes, so there probably won't even be a change in performance.
    it still does a copy. which has a noticable effect on the runtime performance.
    Last edited by OReubens; July 8th, 2012 at 08:02 AM.

  8. #8
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,626

    Re: More readonly/const stuff: std::string vs const char

    Quote Originally Posted by S_M_A View Post
    Since it's quite error prone I assume that the length isn't hardcoded like that in the real code?
    THe above was a simplified example of what I want to achieve.

    In reality the strings and lengths are all encoded into one huge chunk of memory (a static const char []) if you will. The start of the string and the length are determined by the getstring() function. the length of the string is not determined by doing a strlen() of the start of the string, the length is determined by getting the start of the next string and doing a difference between the two minus 1 for the terminator.

  9. #9
    Join Date
    Apr 1999
    Posts
    27,449

    Re: More readonly/const stuff: std::string vs const char

    Quote Originally Posted by OReubens View Post
    THe problem is that I do have a LOT of access to the strings, so the continuous allocation/copy/deallocation is taking a significant portion of the running time. would be awesome if I could avoid this.
    In a typical run of your application, how many different strings are actually used? If the same set of strings are repeatedly being requested, then the obvious solution would be to build some sort of map of ints to string pointers.
    Code:
    typedef std::map<int, std::string*> StringMap;
    StringMap theStrings;
    //...
    std::string& getString(int theStringIWant)
    {
       StringMap::iterator it = theStrings.find( theStringIWant );
       if ( theStrings.find(theStringIWant) == theStrings.end() )
       {
           std::string *pS = new std::string(whatever);
           theStrings.insert(make_pair(theStringIWant, pS));
           return *pS;
        }
        else
             return *it->second;
    }
    A reference is returned, not an object, so the only copying and allocation is done if the string has not yet been requested. Since the map is keyed on ints, the lookup time will be negligible. Of course, you have to deallocate the string data that has been built into the map at the end of the program.

    So if you're requesting the same strings over and over again, maybe something like this could be useful.

    Regards,

    Paul McKenzie

  10. #10
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,626

    Re: More readonly/const stuff: std::string vs const char

    Quote Originally Posted by Paul McKenzie View Post
    So if you're requesting the same strings over and over again, maybe something like this could be useful.
    about 90% of the strings will be accessed in a typical run of the application. the remaining 10% are exceptional.

    of the 90%, about a third of them (which third depends on circumstances I can't predict) will be accessed "a lot" with the remaining 2/3 being accessed "a lot less". Ballpark number is a ratio of 200 to 1. on number of accesses.
    if circumstances change so does the amount of accesses on each string relative to the others.

    Even with all that... and even if circumstances didn't change and you could somehow make an ideal "recently used" caching system, I cannot afford to have 30% of all the strings permanently allocated and using 35Mb or so of ram. the majority of the strings are "just" outside of the strings's internal buffer, so all those objects are adding up. When I did my earlier 50Mb calculation, I didn't even count this in (wasn't aware of it), the actual size in string objects (internal buffer + pointer + length + maxsize + allocator pointer, ...) + the memory they point to adds up to just over 90Mb (for around 50Mb of actual const char strings).

    35Mb may seem like peanuts. But not every computer out there has 64bits of addressing space and gigabytes of RAM installed. And even if it does, you can't assume every program on the machine has that big of a sandbox it is allowed to play in.

    right now, about 30% of the entire runtime of the application is in string allocation/copying/deallocation of static const char's. I already managed to take it from 40% to this 30% by using the string constructor with a length instead of the one without.


    (yeh, I know, I'm not the typical average programmer out there )

  11. #11
    Join Date
    Apr 1999
    Posts
    27,449

    Re: More readonly/const stuff: std::string vs const char

    Quote Originally Posted by OReubens View Post
    35Mb may seem like peanuts. But not every computer out there has 64bits of addressing space and gigabytes of RAM installed.
    I don't see where the "64 bits of addressing" comes into play when allocating 35Mb.
    And even if it does, you can't assume every program on the machine has that big of a sandbox it is allowed to play in.
    So what are your program's minimum requirements? Maybe you should consider increasing whatever minimum requirements you have (if you have these requirements) and state these requirements in black and white, as most software shops do. Then you get no surprises if someone's system doesn't have the horsepower.

    But quite honestly, 35 MB of allocated data should be able to be handled by most, if not all, modern system.

    Regards,

    Paul McKenzie

  12. #12
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,626

    Re: More readonly/const stuff: std::string vs const char

    Quote Originally Posted by Paul McKenzie View Post
    I don't see where the "64 bits of addressing" comes into play when allocating 35Mb.
    Just a casual observation.
    Most people don't seem to have any worries at all if even very simple programs use several hundreds Mb of ram. Optimizing memory usage seems to be the least of a lot of programmers worries. Many programmers have become so accustomed to having 64bit addressing and multi gigabytes of ram to play with.

    <tinfoil hat>It's a conspiracy of the RAM chip manufacturers I tellz ya!!!</tinfoil hat>


    So what are your program's minimum requirements? Maybe you should consider increasing whatever minimum requirements you have (if you have these requirements) and state these requirements in black and white, as most software shops do. Then you get no surprises if someone's system doesn't have the horsepower.
    I cannot increase them by that amount.

    It gets even worse as we are expecting the need to internationalize "soon" and needing to change all the strings to wchar_t doubling memory needs if we would store all the strings permanently.

    The app also needs memory for other stuff. The stringhandling is simply the current bottleneck in runtime performance.


    But quite honestly, 35 MB of allocated data should be able to be handled by most, if not all, modern system.
    <tinfoil hat>OMG you're one of them !!!</tinfoil hat>

    The 35Mb is the case if I could somehow "optimize" the cache to only holding the 30% most used strings and temp'ing the remaining lower usage 70% and adjust this cache as needed. I see no realistic way of doing this.
    If I store 'em all I'm on 90Mb. With the expected internationalization. This turns to 160Mb. We also have some extra features planned which will increase this amount even further.


    If this were the only app running on the server... yes... maybe... Even then it's still a bad excuse to have a 2Minute startup time and wasting 90Mb duplicating data that is already present elsewhere in the exe image.

    Storing all the const char's as strings all the time is simply not an option.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured