CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 2 of 3 FirstFirst 123 LastLast
Results 16 to 30 of 45
  1. #16
    Join Date
    Jul 2002
    Location
    Portsmouth. United Kingdom
    Posts
    2,727

    Re: std::string and contiguous memory?

    "It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong."
    Richard P. Feynman

  2. #17
    Join Date
    Feb 2005
    Location
    Denver
    Posts
    353

    Re: std::string and contiguous memory?

    Quote Originally Posted by JohnW@Wessex View Post
    Thanks for the link. One question though, under the notes section it says... "Writing to the character array accessed through data is undefined behavior." Does this apply even to the C++11 standard even though the memory is guaranteed to be contiguous?

  3. #18
    Join Date
    Nov 2003
    Posts
    1,902

    Re: std::string and contiguous memory?

    If you were in my code review, anything that doesn't require const_cast is the only correct solution. The contiguousness of std::string:ata/c_str is irrelevant since they can't satisfy the interface requirements. Respect the interface, even if you know it's wrong but can't change it.

    >> It is not documented anywhere in the library that the contents are or are not modified. And again, it's better to be safe than sorry.
    Your primary and only argument should be "respect the interface" with a dash of "better to be safe than sorry". Don't get side tracked on contiguousness - though I will say that if it isn't guaranteed explicitly by the C++ standard your working with, don't rely on it (another dash of safe-than-sorry).

    Make your code unquestionably correct so these code reviews don't last as long

    gg

  4. #19
    Join Date
    Jul 2002
    Location
    Portsmouth. United Kingdom
    Posts
    2,727

    Re: std::string and contiguous memory?

    "Make your code unquestionably correct..."

    Sound advice
    "It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong."
    Richard P. Feynman

  5. #20
    Join Date
    Feb 2005
    Location
    Denver
    Posts
    353

    Re: std::string and contiguous memory?

    Quote Originally Posted by Codeplug View Post
    If you were in my code review, anything that doesn't require const_cast is the only correct solution. The contiguousness of std::string:ata/c_str is irrelevant since they can't satisfy the interface requirements. Respect the interface, even if you know it's wrong but can't change it.
    I agree with this 100%. Sometimes I feel like I'm talking to a wall on some of these issues though. One of the arguments used by this "primary" code reviewer was "...the maintenance burden of using vectors of chars does not justify its use over std::string:ata() since both are guaranteed to work." This is absolutely ridiculous and I completely disagree with this statement. This is one of the reasons why I requested some help with some hard cold proof as to the incorrectness and flaw in this logic.

    Thanks for your input!

  6. #21
    Join Date
    Jan 2006
    Location
    Singapore
    Posts
    6,765

    Re: std::string and contiguous memory?

    Quote Originally Posted by sszd
    One question though, is data() guarenteed to return a null terminated string, or is that also implementation dependent? If not, this could also potentially cause a core dump by the library call.
    data() does not guarantee null termination; c_str() guarantees null termination.

    Quote Originally Posted by sszd
    Does anyone have a quote from the standard that indicates std::string is implementation dependent and is not guaranteed to be in contiguous memory (prior to the C++11 standard of course)?
    I think that this is by omission, i.e., since there was no requirement for storage to be contiguous, storage need not be contiguous. Herb Sutter has a word on this: Cringe not: Vectors are guaranteed to be contiguous (his reply concerning std::string on 2008-04-08 at 12:28 pm).

    With respect to his statement that "current ISO C++ does require &str[0] to cough up a pointer to contiguous string data", I note that there was a defect in the standard in that C++03 defined operator[] as returning data()[pos] (or rather "as if" it did so), but of course this cannot be so for the non-const overload, so it leaves open to interpretation just what that might mean if an implementor wanted to push the boundaries.

    Quote Originally Posted by sszd
    Does data() return a char const* to memory that is independent of the memory where the actual string is stored, or is this also implementation dependent?
    Implementation defined.

    EDIT:
    Quote Originally Posted by sszd
    One question though, under the notes section it says... "Writing to the character array accessed through data is undefined behavior." Does this apply even to the C++11 standard even though the memory is guaranteed to be contiguous?
    Yes. C++11 did not change that.

    Quote Originally Posted by Codeplug
    anything that doesn't require const_cast is the only correct solution.
    I don't agree: with a legacy interface that is not const-correct, the use of const_cast can be valid. In this case, it wouldn't be.
    Last edited by laserlight; June 28th, 2012 at 10:53 AM.
    C + C++ Compiler: MinGW port of GCC
    Build + Version Control System: SCons + Bazaar

    Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
    Kindly rate my posts if you found them useful

  7. #22
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: std::string and contiguous memory?

    Quote Originally Posted by sszd View Post
    Thanks for the link. One question though, under the notes section it says... "Writing to the character array accessed through data is undefined behavior." Does this apply even to the C++11 standard even though the memory is guaranteed to be contiguous?
    You are confusing two different notions here:

    c_str and data have ALWAYS guaranteed to return valid c-strings (ergo, contiguous in memory). The only problem is that is that the returned c-string is actually const, and you can't mutate it. This is still the case with C++11.

    C++11, on the other hand, added the guarantee that the storage of the inner chars that compose that std::string are contiguous. This means that if you extract a pointer from a_string[0], then it makes sense to iterate on that string.

    ----
    It would help to understand the whys of motivations of such a choice behind that choice.

    Historically, there have been tons (tons!) of ways to implement strings. All have there ups and downs. Some use reference counting, some used copy on write, some used a deque like approach etc... One of the major aspects of c++ has always been "The user has the choice", so the standard said "implement as you like, we force nothing on you". So they did not enforce contiguous storage.

    However, even if users had access to a beautiful c++ string class, C did not disappear, and having a way to extract a c-string from an std::string is a sine-qua-non requirement of std::string. Hence the c_str and data methods. Unfortunately, the returned string may or may not point to the actual inner working of the string. For example, a deque-based implementation could fill a temporary buffer to return on every call. Or, in the case of a copy-on-write string, the "internal" string could be potentially shared across different string objects. At this point mutating the returned c-str could create all sorts of trouble, so it was decided to ban that, and return a const char*.

    ----
    C++11 introduced move semantics, which pretty removes the need for copy on write. Also, C++11 noticed that everyone was using contiguous memory anyways. Therefore, it was decided to enforce contiguous storage to allow the famous &string[0], or &*astring.begin(). In theory, under these new conditions, c_str and data have no reason to return const data, but changing it now could have broken existing code.

    ----
    So there you have it.
    Disclaimer: I made up the "history", but the rationale is pretty accurate, I think.

  8. #23
    Join Date
    Jan 2006
    Location
    Singapore
    Posts
    6,765

    Re: std::string and contiguous memory?

    Quote Originally Posted by monarch_dodra
    c_str and data have ALWAYS guaranteed to return valid c-strings (ergo, contiguous in memory).
    data() does not provide that guarantee though since it does not guarantee null termination, otherwise it would be a wholly redundant part of the interface. (Wait, it is now a wholly redundant part of the interface, heheh.)
    C + C++ Compiler: MinGW port of GCC
    Build + Version Control System: SCons + Bazaar

    Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
    Kindly rate my posts if you found them useful

  9. #24
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: std::string and contiguous memory?

    [QUOTE=laserlight;2073599I don't agree: with a legacy interface that is not const-correct, the use of const_cast can be valid. In this case, it wouldn't be.[/QUOTE]

    In one of our projects, we had a few C (and C++) APIs that did not have const correct input (eg, everything was a non-const pointer).

    We wrapped all these APIs inside a new const-correct interface. The use of const cast is minimal, and well documented, and mimited to a single object. We had someone take time to validate EXACTLY which const_casts were legal, and which aren't. Inside the rest of our code, we use the const-correct interface with full confidence.

    The problem (I find) with const_casting is it is like riding a car without a seat-belt. You take a risk every time, and every time you do it, you are less and less careful...

    I've even seen once a function: "we promise not to modify your input", but inside the implementation: "We modify the input for performance, but we remember to change it back before the end"...
    Is your question related to IO?
    Read this C++ FAQ article at parashift by Marshall Cline. In particular points 1-6.
    It will explain how to correctly deal with IO, how to validate input, and why you shouldn't count on "while(!in.eof())". And it always makes for excellent reading.

  10. #25
    Join Date
    Nov 2003
    Posts
    1,902

    Re: std::string and contiguous memory?

    >> One of the arguments... "...the maintenance burden of using vectors of chars does not justify its use ..."
    If any reader of the code has to stop and ask themselves:
    "hmm, const_cast<>, does that really not get modded?"
    "hmm, is that really guaranteed to be contiguous?"
    "hmm, this looks like a bug waiting to happen, what if..."
    Then you've added readability and maintenance burden.

    The interface is non-const. There is no guarantee that the interface won't modify the non-const buffer. Therefore there is no argument for using const_cast that could possibly hold water.

    gg

  11. #26
    Join Date
    Oct 2008
    Posts
    1,456

    Re: std::string and contiguous memory?

    Quote Originally Posted by Codeplug View Post
    Therefore there is no argument for using const_cast that could possibly hold water.
    well, given memory contiguity, there should be nothing wrong in doing something like

    Code:
    std::string s = ...; // s mutable
    
    if( !s.empty() )
        some_c_api_call( s.data() == &s[0] ? const_cast<char*>( s.data() ) : &std::vector<char>( s.begin(), s.end() )[0], s.size() );
    this may prevent most std::vector conversions ...
    Last edited by superbonzo; June 28th, 2012 at 11:51 AM.

  12. #27
    Join Date
    Jan 2006
    Location
    Singapore
    Posts
    6,765

    Re: std::string and contiguous memory?

    Hmm... but once you know that s.data() == &s[0], then there's no need to use the const_cast because you can use &s[0]. It wouldn't be wrong, but it also wouldn't be necessary.
    C + C++ Compiler: MinGW port of GCC
    Build + Version Control System: SCons + Bazaar

    Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
    Kindly rate my posts if you found them useful

  13. #28
    Join Date
    Feb 2005
    Location
    Denver
    Posts
    353

    Re: std::string and contiguous memory?

    Quote Originally Posted by Codeplug View Post
    If any reader of the code has to stop and ask themselves:
    "hmm, const_cast<>, does that really not get modded?"
    "hmm, is that really guaranteed to be contiguous?"
    "hmm, this looks like a bug waiting to happen, what if..."
    Then you've added readability and maintenance burden.
    This is a good argument. Thanks.

  14. #29
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: std::string and contiguous memory?

    Quote Originally Posted by superbonzo View Post
    well, given memory contiguity, there should be nothing wrong in doing something like

    Code:
    std::string s = ...; // s mutable
    
    if( !s.empty() )
        some_c_api_call( s.data() == &s[0] ? const_cast<char*>( s.data() ) : &std::vector<char>( s.begin(), s.end() )[0], s.size() );
    this may prevent most std::vector conversions ...
    Quote Originally Posted by laserlight View Post
    Hmm... but once you know that s.data() == &s[0], then there's no need to use the const_cast because you can use &s[0]. It wouldn't be wrong, but it also wouldn't be necessary.
    A particularly perverse implementation could decide to alias s[0] and *s.data(), but store the rest of the mutable data info somewhere else...

    &s[0] would be wrong (as the memory would not be contiguous), and const_casting the data would also be wrong too.

    Likely? No. But as long as we're on the subject of correctness...
    Is your question related to IO?
    Read this C++ FAQ article at parashift by Marshall Cline. In particular points 1-6.
    It will explain how to correctly deal with IO, how to validate input, and why you shouldn't count on "while(!in.eof())". And it always makes for excellent reading.

  15. #30
    Join Date
    Feb 2005
    Location
    Denver
    Posts
    353

    Re: std::string and contiguous memory?

    I'm still having difficulty with this conversation between this one code reviewer and myself. I think I'm going to simply stop trying to convince him and just move on.

    The code review, as well as our entire conversation has taken place over a series of emails throughout the day. If you don't mind, I'd like to get some feedback from some of you out there with regard to how this conversation progressed. I've posted the email threads below so you can see exactly how this conversation went, and each of our trains of thought. I'm sure my side of the conversation could have been better, but I am intimated by this individual due to the fact that he is held in such high regard by management, and I certainly did not want to offend him, although I may have done just that.

    My comments will be in blue, the code review's in green. Parts of the conversation that I stole from the internet (mostly quotes from various sites) will be in red. Oh, and I hope you don't mind, but I also integrated some statements from several of you out there into my own parts of the conversation. Many of you have been extremely helpful.

    Reviewer: There is no need to use a std::vector<char> instead of a std::string. The underlying storage of a std::string is guaranteed to be contiguous just like a std::vector (and std::string's storage can be accessed directly by using the “data” method).

    Me: Unless the C++ standard has changed (and we are using a fairly old compiler, relatively speaking), the last time I researched how strings are stored (albeit it's been many years), they are not guaranteed to be contiguous. It's completely up to the compiler designers on how this type of container is implemented, although most compilers are indeed implemented this way. Vectors, on the other hand, are guaranteed to be contiguous. With that said, it is extremely dangerous to pass a pointer to the internal memory of a string to a function that requires a pointer to a non-const string. That would of course require casting the constness away. I'm sure there's a very good reason that the data() method of string only returns pointer to const char. According to the standard, modifying the contents results in “undefined” behavior. Since the external C functions in the library being called in the code require non-const arguments, and not knowing whether or not those arguments are being modified, it's much safer to put the data in a buffer (in this case a non-const vector), and then pass that to the function.

    Reviewer: I don't know the entire history of the definition of std::string, but its underlying memory has been guaranteed to be contiguous for many years. You are assuming that because you were able to get the underlying pointer of a vector without a const-cast that it is “defined” behavior, but as far as I know, dereferencing the first non-const element of a vector is not a “defined” interface any more than const-casting std::string:ata(). Finally, even if it were, I would still suggest that the maintenance burden of using vectors of chars does not justify its use over std::string:ata() since both are guaranteed to work.

    Me: That was not an assumption. Getting the underlying address of a vector is well defined. And again, the underlying address of a string is “undefined behavior” if it is modified. It is not guaranteed to work.

    Reviewer: You are mixing the concept of “undefined” with “guaranteed to work”. You are guaranteed that the pointer to memory is valid under defined conditions, so if you are just using the allocated memory as a buffer, it is guaranteed to work.

    Me: Now I'm needing a sanity check. Here are some quotes I've found online.

    from josuttis's book: “The C++ standard library does not state clearly whether the elements of a vector are requried to be contiguous memory. However, it is the intention that this is guaranteed and it will be fixed due to the defect report”

    Reviewer: This information is obsolete; when was this book published? I would consider relying more on online sources for up-to-date information.

    from Meyers book concerning using vector and string in legacy code: “The approach to getting a pointer to the beginning of the container data that works for vector isn't reliable for strings, because (1) the data for strings are not guaranteed to be stored in contiguous memory, and (2) the internal representation of a string is not guaranteed to end with a null character.”

    Reviewer: More obsolete information. Note that I did not say this stuff has always been guaranteed by the C++ standard. It has been guaranteed in the present by the fact that all known implementations of std::string do guarantee these things, and the latest C++ standard that was approved last year guarantee that all future implementations will be compliant as well, so it is overly paranoid to assume otherwise.

    Me: See final comments below.

    Scott Meyer's has dedicated an entire chapter in his “Effective STL” book to usage of vector with legacy 'C' API's. He wouldn't have written it if he knew that vector was not guaranteed to be contiguous.

    From Scott Meyer's “Effective STL” book: - a vector is guaranteed to be contiguous, and therefore it can be used in legacy C and C++ functions that require a pointer to a contiguous buffer.

    Reviewer: I'm not claiming that what you did won't work, only that it's unnecessarily convoluted. I am concerned about the impact on maintenance and what junior engineers are learning.

    Me: See final comments below.

    Continuity is part of the vector abstraction. It's so important, in fact, that when it was discovered that the C++98 standard didn't completely guarantee continuity, the C++03 standard was amended to explicity add the guarantee.

    Why is it so important that vectors be contiguous? Because that's what you need to guarantee that a vector is layout-compatible with a C array, and therefore we have no reason not to use vector as a superior and type-safe alternative to arrays even when we need to exchange data with C code. So vector is our gateway to other languages and most operating systems, whose lingua franca is the venerable C array.

    Reviewer: That may have been true at the time he wrote it, but it is certainly not true now (std::array is more like a C array than a std::vector). In any case, he only stated that a vector is superior to a C array; there is no comment about whether a std::string is ever superior to a std::vector<char>.

    From cppreference.com: - std::basic_string:ata – const CharT* data() const; - Returns pointer to the underlying array serving as character storage.
    Writing to the character array accessed through data is undefined behavior.
    The pointer obtained from data() should not be considered valid after any non-const operation on the string.
    Since C++11, data() and c_str() perform the same function.


    from the C++11 standard clause 21.4.1 Paragraph 5: The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

    Me: You can see that the new standard does indeed guarantee std::string to be contiguous, however, we are not using a C++11 compliant compiler. So until that happens, I still (if for nothing more than safety) perfer to use a vector in cases like this.

    Reviewer: As noted above, there is no safety benefit because all known and future implementations of std::string are contiguous, but there is a maintenance burden in using std::vector<char> here. As a side issue, note that the new standard also adds the “data” method to vector and other containers, as calling that method for the underlying pointer is less obfuscating than the &foo[0] construct. For some reason, only the std::string doesn't have a non-const version of data added, perhaps over concern over backwards compatibilty. I wouldn't be bothered if you perferred the &foo[0] construct over data() for strings.

    Me: Well, it looks like we're not going to convince one another. But I would just like to make a few last comments.

    You've talked about following good coding practices in other reviews. Well, a good rule of thumb is, respect the interface. Make your code unquestionably correct. If anything, casting the constness away in order to violate an interface is not (in my opinion) good coding standards. Also, by doing so you actually add to the burden of maintenance, because now some programmer coming along (junior or otherwise) will look at that and say...

    “...const_cast<>, does that really not get modified?”
    “...does that really return a null terminated pointer to char?”
    “...this looks like a bug waiting to happen, what if...”
    etc...


    By the way, the standard does not guarantee that data() return a null terminated pointer to char. Just because all known current implementations guarantee that for there own compiler is not a guarantee for all other and future compilers in existence. It's a guarantee if the standard says its a guarantee. And even the newest standard doesn't guarantee that. But it does guarantee contiguous memory for the internal layout. It also says that modifying the buffer that is returned by data() is still undefined behavior.

    So that's it so far. The day was late and I left work.

    I would like to say I have no intention on changing the code that I have written based on our conversation, unless he goes to management and tries to force me to see things his way. They do look at him in such high regard that I may have no choice. In that case, you can bet I most certainly will put in comments indicating how I feel these changes are inappropriate and flawed.

    Please feel free to rip both sides of this conversation apart. If there are glaring errors (by either side), please comment on them. Thanks.

Page 2 of 3 FirstFirst 123 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured