http://en.cppreference.com/w/cpp/str...ic_string/data
Printable View
If you were in my code review, anything that doesn't require const_cast is the only correct solution. The contiguousness of std::string::data/c_str is irrelevant since they can't satisfy the interface requirements. Respect the interface, even if you know it's wrong but can't change it.
>> It is not documented anywhere in the library that the contents are or are not modified. And again, it's better to be safe than sorry.
Your primary and only argument should be "respect the interface" with a dash of "better to be safe than sorry". Don't get side tracked on contiguousness - though I will say that if it isn't guaranteed explicitly by the C++ standard your working with, don't rely on it (another dash of safe-than-sorry).
Make your code unquestionably correct so these code reviews don't last as long :)
gg
"Make your code unquestionably correct..."
Sound advice :thumb:
I agree with this 100%. Sometimes I feel like I'm talking to a wall on some of these issues though. One of the arguments used by this "primary" code reviewer was "...the maintenance burden of using vectors of chars does not justify its use over std::string::data() since both are guaranteed to work." This is absolutely ridiculous and I completely disagree with this statement. This is one of the reasons why I requested some help with some hard cold proof as to the incorrectness and flaw in this logic.
Thanks for your input!
data() does not guarantee null termination; c_str() guarantees null termination.Quote:
Originally Posted by sszd
I think that this is by omission, i.e., since there was no requirement for storage to be contiguous, storage need not be contiguous. Herb Sutter has a word on this: Cringe not: Vectors are guaranteed to be contiguous (his reply concerning std::string on 2008-04-08 at 12:28 pm).Quote:
Originally Posted by sszd
With respect to his statement that "current ISO C++ does require &str[0] to cough up a pointer to contiguous string data", I note that there was a defect in the standard in that C++03 defined operator[] as returning data()[pos] (or rather "as if" it did so), but of course this cannot be so for the non-const overload, so it leaves open to interpretation just what that might mean if an implementor wanted to push the boundaries.
Implementation defined.Quote:
Originally Posted by sszd
EDIT:
Yes. C++11 did not change that.Quote:
Originally Posted by sszd
I don't agree: with a legacy interface that is not const-correct, the use of const_cast can be valid. In this case, it wouldn't be.Quote:
Originally Posted by Codeplug
You are confusing two different notions here:
c_str and data have ALWAYS guaranteed to return valid c-strings (ergo, contiguous in memory). The only problem is that is that the returned c-string is actually const, and you can't mutate it. This is still the case with C++11.
C++11, on the other hand, added the guarantee that the storage of the inner chars that compose that std::string are contiguous. This means that if you extract a pointer from a_string[0], then it makes sense to iterate on that string.
----
It would help to understand the whys of motivations of such a choice behind that choice.
Historically, there have been tons (tons!) of ways to implement strings. All have there ups and downs. Some use reference counting, some used copy on write, some used a deque like approach etc... One of the major aspects of c++ has always been "The user has the choice", so the standard said "implement as you like, we force nothing on you". So they did not enforce contiguous storage.
However, even if users had access to a beautiful c++ string class, C did not disappear, and having a way to extract a c-string from an std::string is a sine-qua-non requirement of std::string. Hence the c_str and data methods. Unfortunately, the returned string may or may not point to the actual inner working of the string. For example, a deque-based implementation could fill a temporary buffer to return on every call. Or, in the case of a copy-on-write string, the "internal" string could be potentially shared across different string objects. At this point mutating the returned c-str could create all sorts of trouble, so it was decided to ban that, and return a const char*.
----
C++11 introduced move semantics, which pretty removes the need for copy on write. Also, C++11 noticed that everyone was using contiguous memory anyways. Therefore, it was decided to enforce contiguous storage to allow the famous &string[0], or &*astring.begin(). In theory, under these new conditions, c_str and data have no reason to return const data, but changing it now could have broken existing code.
----
So there you have it.
Disclaimer: I made up the "history", but the rationale is pretty accurate, I think.
data() does not provide that guarantee though since it does not guarantee null termination, otherwise it would be a wholly redundant part of the interface. (Wait, it is now a wholly redundant part of the interface, heheh.)Quote:
Originally Posted by monarch_dodra
[QUOTE=laserlight;2073599I don't agree: with a legacy interface that is not const-correct, the use of const_cast can be valid. In this case, it wouldn't be.[/QUOTE]
In one of our projects, we had a few C (and C++) APIs that did not have const correct input (eg, everything was a non-const pointer).
We wrapped all these APIs inside a new const-correct interface. The use of const cast is minimal, and well documented, and mimited to a single object. We had someone take time to validate EXACTLY which const_casts were legal, and which aren't. Inside the rest of our code, we use the const-correct interface with full confidence.
The problem (I find) with const_casting is it is like riding a car without a seat-belt. You take a risk every time, and every time you do it, you are less and less careful...
I've even seen once a function: "we promise not to modify your input", but inside the implementation: "We modify the input for performance, but we remember to change it back before the end"...
>> One of the arguments... "...the maintenance burden of using vectors of chars does not justify its use ..."
If any reader of the code has to stop and ask themselves:
"hmm, const_cast<>, does that really not get modded?"
"hmm, is that really guaranteed to be contiguous?"
"hmm, this looks like a bug waiting to happen, what if..."
Then you've added readability and maintenance burden.
The interface is non-const. There is no guarantee that the interface won't modify the non-const buffer. Therefore there is no argument for using const_cast that could possibly hold water.
gg
well, given memory contiguity, there should be nothing wrong in doing something like
this may prevent most std::vector conversions ...Code:std::string s = ...; // s mutable
if( !s.empty() )
some_c_api_call( s.data() == &s[0] ? const_cast<char*>( s.data() ) : &std::vector<char>( s.begin(), s.end() )[0], s.size() );
Hmm... but once you know that s.data() == &s[0], then there's no need to use the const_cast because you can use &s[0]. It wouldn't be wrong, but it also wouldn't be necessary.
A particularly perverse implementation could decide to alias s[0] and *s.data(), but store the rest of the mutable data info somewhere else...
&s[0] would be wrong (as the memory would not be contiguous), and const_casting the data would also be wrong too.
Likely? No. But as long as we're on the subject of correctness...
I'm still having difficulty with this conversation between this one code reviewer and myself. I think I'm going to simply stop trying to convince him and just move on.
The code review, as well as our entire conversation has taken place over a series of emails throughout the day. If you don't mind, I'd like to get some feedback from some of you out there with regard to how this conversation progressed. I've posted the email threads below so you can see exactly how this conversation went, and each of our trains of thought. I'm sure my side of the conversation could have been better, but I am intimated by this individual due to the fact that he is held in such high regard by management, and I certainly did not want to offend him, although I may have done just that.
My comments will be in blue, the code review's in green. Parts of the conversation that I stole from the internet (mostly quotes from various sites) will be in red. Oh, and I hope you don't mind, but I also integrated some statements from several of you out there into my own parts of the conversation. Many of you have been extremely helpful.
Reviewer: There is no need to use a std::vector<char> instead of a std::string. The underlying storage of a std::string is guaranteed to be contiguous just like a std::vector (and std::string's storage can be accessed directly by using the “data” method).
Me: Unless the C++ standard has changed (and we are using a fairly old compiler, relatively speaking), the last time I researched how strings are stored (albeit it's been many years), they are not guaranteed to be contiguous. It's completely up to the compiler designers on how this type of container is implemented, although most compilers are indeed implemented this way. Vectors, on the other hand, are guaranteed to be contiguous. With that said, it is extremely dangerous to pass a pointer to the internal memory of a string to a function that requires a pointer to a non-const string. That would of course require casting the constness away. I'm sure there's a very good reason that the data() method of string only returns pointer to const char. According to the standard, modifying the contents results in “undefined” behavior. Since the external C functions in the library being called in the code require non-const arguments, and not knowing whether or not those arguments are being modified, it's much safer to put the data in a buffer (in this case a non-const vector), and then pass that to the function.
Reviewer: I don't know the entire history of the definition of std::string, but its underlying memory has been guaranteed to be contiguous for many years. You are assuming that because you were able to get the underlying pointer of a vector without a const-cast that it is “defined” behavior, but as far as I know, dereferencing the first non-const element of a vector is not a “defined” interface any more than const-casting std::string::data(). Finally, even if it were, I would still suggest that the maintenance burden of using vectors of chars does not justify its use over std::string::data() since both are guaranteed to work.
Me: That was not an assumption. Getting the underlying address of a vector is well defined. And again, the underlying address of a string is “undefined behavior” if it is modified. It is not guaranteed to work.
Reviewer: You are mixing the concept of “undefined” with “guaranteed to work”. You are guaranteed that the pointer to memory is valid under defined conditions, so if you are just using the allocated memory as a buffer, it is guaranteed to work.
Me: Now I'm needing a sanity check. Here are some quotes I've found online.
from josuttis's book: “The C++ standard library does not state clearly whether the elements of a vector are requried to be contiguous memory. However, it is the intention that this is guaranteed and it will be fixed due to the defect report”
Reviewer: This information is obsolete; when was this book published? I would consider relying more on online sources for up-to-date information.
from Meyers book concerning using vector and string in legacy code: “The approach to getting a pointer to the beginning of the container data that works for vector isn't reliable for strings, because (1) the data for strings are not guaranteed to be stored in contiguous memory, and (2) the internal representation of a string is not guaranteed to end with a null character.”
Reviewer: More obsolete information. Note that I did not say this stuff has always been guaranteed by the C++ standard. It has been guaranteed in the present by the fact that all known implementations of std::string do guarantee these things, and the latest C++ standard that was approved last year guarantee that all future implementations will be compliant as well, so it is overly paranoid to assume otherwise.
Me: See final comments below.
Scott Meyer's has dedicated an entire chapter in his “Effective STL” book to usage of vector with legacy 'C' API's. He wouldn't have written it if he knew that vector was not guaranteed to be contiguous.
From Scott Meyer's “Effective STL” book: - a vector is guaranteed to be contiguous, and therefore it can be used in legacy C and C++ functions that require a pointer to a contiguous buffer.
Reviewer: I'm not claiming that what you did won't work, only that it's unnecessarily convoluted. I am concerned about the impact on maintenance and what junior engineers are learning.
Me: See final comments below.
Continuity is part of the vector abstraction. It's so important, in fact, that when it was discovered that the C++98 standard didn't completely guarantee continuity, the C++03 standard was amended to explicity add the guarantee.
Why is it so important that vectors be contiguous? Because that's what you need to guarantee that a vector is layout-compatible with a C array, and therefore we have no reason not to use vector as a superior and type-safe alternative to arrays even when we need to exchange data with C code. So vector is our gateway to other languages and most operating systems, whose lingua franca is the venerable C array.
Reviewer: That may have been true at the time he wrote it, but it is certainly not true now (std::array is more like a C array than a std::vector). In any case, he only stated that a vector is superior to a C array; there is no comment about whether a std::string is ever superior to a std::vector<char>.
From cppreference.com: - std::basic_string::data – const CharT* data() const; - Returns pointer to the underlying array serving as character storage.
Writing to the character array accessed through data is undefined behavior.
The pointer obtained from data() should not be considered valid after any non-const operation on the string.
Since C++11, data() and c_str() perform the same function.
from the C++11 standard clause 21.4.1 Paragraph 5: The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().
Me: You can see that the new standard does indeed guarantee std::string to be contiguous, however, we are not using a C++11 compliant compiler. So until that happens, I still (if for nothing more than safety) perfer to use a vector in cases like this.
Reviewer: As noted above, there is no safety benefit because all known and future implementations of std::string are contiguous, but there is a maintenance burden in using std::vector<char> here. As a side issue, note that the new standard also adds the “data” method to vector and other containers, as calling that method for the underlying pointer is less obfuscating than the &foo[0] construct. For some reason, only the std::string doesn't have a non-const version of data added, perhaps over concern over backwards compatibilty. I wouldn't be bothered if you perferred the &foo[0] construct over data() for strings.
Me: Well, it looks like we're not going to convince one another. But I would just like to make a few last comments.
You've talked about following good coding practices in other reviews. Well, a good rule of thumb is, respect the interface. Make your code unquestionably correct. If anything, casting the constness away in order to violate an interface is not (in my opinion) good coding standards. Also, by doing so you actually add to the burden of maintenance, because now some programmer coming along (junior or otherwise) will look at that and say...
“...const_cast<>, does that really not get modified?”
“...does that really return a null terminated pointer to char?”
“...this looks like a bug waiting to happen, what if...”
etc...
By the way, the standard does not guarantee that data() return a null terminated pointer to char. Just because all known current implementations guarantee that for there own compiler is not a guarantee for all other and future compilers in existence. It's a guarantee if the standard says its a guarantee. And even the newest standard doesn't guarantee that. But it does guarantee contiguous memory for the internal layout. It also says that modifying the buffer that is returned by data() is still undefined behavior.
So that's it so far. The day was late and I left work.
I would like to say I have no intention on changing the code that I have written based on our conversation, unless he goes to management and tries to force me to see things his way. They do look at him in such high regard that I may have no choice. In that case, you can bet I most certainly will put in comments indicating how I feel these changes are inappropriate and flawed.
Please feel free to rip both sides of this conversation apart. If there are glaring errors (by either side), please comment on them. Thanks.