-
std::string and contiguous memory?
I need some help arguing a point. Here's the situation. I recently wrote some code that required making several calls into a COTS API library where the function calls required one of the arguments to be a non-const pointer to char. I learned a long time ago that you should never mess the internal memory of a std::string. With that in mind, whenever I've run across situations like this I've always copied a std::string into a non-const vector of chars (std::vector<char>). I then pass the address of the first element into the function as such (&v[0]). In using the vector as a buffer like this, I don't care whether the function modifies the argument or not.
In a recent code review, the so-called resident “expert” on our program indicated that there was no reason to use a vector. I should use the data() method of the std::string to obtain an internal pointer to the string's memory and pass that into the function since the returned memory of this call is guaranteed to be contiguous. Of course, that would require casting the constness away from the pointer. I argued that the implementation of a std::string is at the discretion of the compiler designers. Because of this, the internal memory of a std::string in not guaranteed to be contiguous, although, most probably do implement it that way. I went on to say that messing with its contents has undefined behavior. A vector on the other hand is guaranteed to have contiguous memory. My first question is... am I off my rocker, or are these statements correct?
He then went on to say that I was “assuming” that the implementation of taking the address of the first element of a vector is well defined. Well, isn't it?
The gist of all this is that I am either completely out to lunch, or I need some hard evidence to back my claims. What I was wondering is if someone has access to the C++ standards (pre-C++11), could you provide some quoted statements from the document to help prove my point (assuming, of course, I'm not delirious). Any comments, statements, links, or just some simple quotes from reliable sources would be greatly appreciated. Thanks for your help.
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by sszd
I argued that the implementation of a std::string is at the discretion of the compiler designers. Because of this, the internal memory of a std::string in not guaranteed to be contiguous, although, most probably do implement it that way. I went on to say that messing with its contents has undefined behavior.
You were right, however, since the 2011 edition of the C++ standard, the internal storage of the contents of a std::string is guaranteed to be contiguous:
Quote:
Originally Posted by C++11 Clause 21.4.1 Paragraph 5
The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().
That said, instead of calling data() and then casting away const-ness, I think that it would be better to use the &str[0] idiom (like how it is for std::vector), after checking that the string is not empty.
Quote:
Originally Posted by sszd
He then went on to say that I was “assuming” that the implementation of taking the address of the first element of a vector is well defined. Well, isn't it?
Yes, that has been guaranteed since the 2003 edition of the C++ standard.
-
Re: std::string and contiguous memory?
Thank you for the reply. However, I was actually looking for quotes from the standards previous to C++11 since we are using older gnu compilers and have no intention on upgrading any time soon. Since that's the case, would you still recommend using &s[0]?
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by sszd
However, I was actually looking for quotes from the standards previous to C++11 since we are using older gnu compilers and have no intention on upgrading any time soon. Since that's the case, would you still recommend using &s[0]?
Since you are compiling using a known set of compilers, you could just check if the given standard library implementations store the contents of std::string contiguously. If one of them doesn't, or if you cannot determine this, then I would not recommend using &s[0] as it is better to be safe than sorry.
Another thing to consider:
Quote:
Originally Posted by sszd
COTS API library where the function calls required one of the arguments to be a non-const pointer to char.
If this is due to a legacy interface that is not const-correct, and it is documented that the contents of the array is not modified through that pointer to non-const char, then using data() and casting away const-ness is fine.
-
Re: std::string and contiguous memory?
Does it really matter knowing if the string stores its internal contents contiguously? Last I checked, "data" and "c_str" are guaranteed to return valid (const) c-strings anyways, regardless of the version.
If you use these, then you are 100% safe. You can't mutate though...
Quote:
Originally Posted by
laserlight
Yes, that has been guaranteed since the 2003 edition of the C++ standard.
Really? I would have thought it be guaranteed since day 0.
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by monarch_dodra
You can't mutate though...
Which is the reason for this thread: the pointer being passed is a pointer to non-const char (though I note that sszd wrote "non-const pointer to char", but I doubt he/she meant "pointer, that is const, to non-const char" ;) ).
Quote:
Originally Posted by monarch_dodra
Really? I would have thought it be guaranteed since day 0.
I believe it was a defect in the original version, i.e., they forgot to require it. That said, I would bet no serious standard library implementation ever had std::vector store its contents in a non-contiguous fashion.
-
Re: std::string and contiguous memory?
... and if you need a mutable char array, you could check memory contiguity at runtime, eventually in debug mode, or, in release, eventually falling back to std::vector<char>. Say, something like
Code:
bool is_memory_contiguous( const std::string& s )
{
return std::adjacent_find( s.cbegin(), s.cend(), []( const char& l, const char& r ) { return &l + 1 != &r; } ) == s.cend();
}
// used as
std::string s = ...;
_ASSERT( is_memory_contiguous( s ) );
if( !s.empty() )
some_c_api_call( &s[0], s.size() );
// or
std::string s = ...;
if( !s.empty() )
some_c_api_call( is_memory_contiguous( s ) ? &s[0] : &std::vector<char>( s.begin(), s.end() )[0], s.size() );
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by
sszd
In a recent code review, the so-called resident “expert” on our program indicated that there was no reason to use a vector. I should use the data() method of the std::string to obtain an internal pointer to the string's memory and pass that into the function since the returned memory of this call is guaranteed to be contiguous. Of course, that would require casting the constness away from the pointer.
The potential error is in the const-cast. If the memory pointed to by the (const-casted) pointer is not modified by the function, all is fine. But if it is modified, you could be looking at undefined behavior. The standard states that modifying the contents of a const object is undefined behavior. In principle, the string implementation could use a const object under the hood, e.g. for empty strings.
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by
laserlight
Which is the reason for this thread: the pointer being passed is a pointer to non-const char (though I note that sszd wrote "non-const pointer to char", but I doubt he/she meant "pointer, that is const, to non-const char" ;) ).
Well, he did say the method being passed to guaranteed no mutation would occur, so followed up with a const_cast should be fine.
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by monarch_dodra
Well, he did say the method being passed to guaranteed no mutation would occur, so followed up with a const_cast should be fine.
Hmm... could you quote that part? I can't seem to find it among what sszd wrote, and I looked over both posts carefully. Twice :(
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by
sszd
In a recent code review, the so-called resident “expert” on our program indicated that there was no reason to use a vector. I should use the data() method of the std::string to obtain an internal pointer to the string's memory and pass that into the function since the returned memory of this call is guaranteed to be contiguous.
OK, but I don't understand this:
Quote:
Of course, that would require casting the constness away from the pointer.
Why is this necessary? What's the signature of the called function? If it is not const char*, then why isn't it const char*?
Quote:
I went on to say that messing with its contents has undefined behavior.
As of pre-2011 ANSI C++, yes, modifying the buffer that is returned by data() is undefined behavior.
Quote:
A vector on the other hand is guaranteed to have contiguous memory. My first question is... am I off my rocker, or are these statements correct?
A vector is guaranteed to be contiguous -- this is stated in the ANSI/ISO specification.
Also, you can show him what one well-respected "resident expert", Scott Meyers, says in one of his books (I think it's Effective STL) -- a vector is guaranteed to be contiguous, and therefore it can be used in legacy C and C++ functions that require a pointer to a contiguous buffer. So who are you to believe, Scott Meyers or your code reviewer?
Regards,
Paul McKenzie
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by Paul McKenzie
As of pre-2011 ANSI C++, yes, modifying the buffer that is returned by data() is undefined behavior.
It will still result in undefined behaviour. It is the &s[0] version that you're thinking of.
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by
sszd
Thank you for the reply. However, I was actually looking for quotes from the standards previous to C++11 since we are using older gnu compilers and have no intention on upgrading any time soon. Since that's the case, would you still recommend using &s[0]?
From the 2003 standard:
Quote:
23.2.4 Class template vector
[1] A vector is a kind of sequence that supports random access iterators. In addition, it supports (amortized)
constant time insert and erase operations at the end; insert and erase in the middle take linear time. Storage
management is handled automatically, though hints can be given to improve efficiency. The elements of a
vector are stored contiguously, meaning that if v is a vector<T, Allocator> where T is some type
other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size().
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by
laserlight
Hmm... could you quote that part? I can't seem to find it among what sszd wrote, and I looked over both posts carefully. Twice :(
My Bad, I was reading
laserlight's post.
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by laserlight
Since you are compiling using a known set of compilers, you could just check if the given standard library implementations store the contents of std::string contiguously. If one of them doesn't, or if you cannot determine this, then I would not recommend using &s[0] as it is better to be safe than sorry.
Well, unless we were using a compiler that supports C++11, I am reluctant to use that construct regardless. And yes, I agree it's better to be safe than sorry. That's why I moved the contents into a vector in the first place.
Quote:
Originally Posted by laserlight
If this is due to a legacy interface that is not const-correct, and it is documented that the contents of the array is not modified through that pointer to non-const char, then using data() and casting away const-ness is fine.
It is not documented anywhere in the library that the contents are or are not modified. And again, it's better to be safe than sorry. One question though, is data() guarenteed to return a null terminated string, or is that also implementation dependent? If not, this could also potentially cause a core dump by the library call.
Quote:
Originally Posted by D_Drmmr
If the memory pointed to by the (const-casted) pointer is not modified by the function, all is fine.
Yes, but we don't know that. And all would be fine only if data() returned a null terminated string.
Quote:
Originally Posted by Paul McKenzie
What's the signature of the called function? If it is not const char*, then why isn't it const char*?
Actually I mentioned that the signature of the COTS library requires a non-const pointer to char. What I should have said is a pointer to a non-const char, or more appropriately, a non-const pointer to a non-const char (i.e. char*), sorry. Anyway, I have no idea why the library calls were written that way. It is an extremely old COTS product that we are simply having to deal with. Oh, and thanks for the Scott Meyers quote!
Quote:
Originally Posted by Philip Nicoletti
From the 2003 standard:
Thanks for the quote.
==========================================================
So, after all of this, I have a few more questions.
Does anyone have a quote from the standard that indicates std::string is implementation dependent and is not guaranteed to be in contiguous memory (prior to the C++11 standard of course)?
Does data() return a char const* to memory that is independent of the memory where the actual string is stored, or is this also implementation dependent?
Is the location of the data that data() returns guaranteed to be null terminated?
The answers to these would indicate to me whether or not it’s safe to cast the const-ness away from the return from data(), regardless of whether or not the function is modifying the contents. But I usually steer away from dangerous (IMHO) stuff like this anyway and simply try to play it safe. That’s why I used a vector. My co-worker is very persistent though, so I just needed some proof.
-
Re: std::string and contiguous memory?
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by
JohnW@Wessex
Thanks for the link. One question though, under the notes section it says... "Writing to the character array accessed through data is undefined behavior." Does this apply even to the C++11 standard even though the memory is guaranteed to be contiguous?
-
Re: std::string and contiguous memory?
If you were in my code review, anything that doesn't require const_cast is the only correct solution. The contiguousness of std::string::data/c_str is irrelevant since they can't satisfy the interface requirements. Respect the interface, even if you know it's wrong but can't change it.
>> It is not documented anywhere in the library that the contents are or are not modified. And again, it's better to be safe than sorry.
Your primary and only argument should be "respect the interface" with a dash of "better to be safe than sorry". Don't get side tracked on contiguousness - though I will say that if it isn't guaranteed explicitly by the C++ standard your working with, don't rely on it (another dash of safe-than-sorry).
Make your code unquestionably correct so these code reviews don't last as long :)
gg
-
Re: std::string and contiguous memory?
"Make your code unquestionably correct..."
Sound advice :thumb:
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by
Codeplug
If you were in my code review, anything that doesn't require const_cast is the only correct solution. The contiguousness of std::string::data/c_str is irrelevant since they can't satisfy the interface requirements. Respect the interface, even if you know it's wrong but can't change it.
I agree with this 100%. Sometimes I feel like I'm talking to a wall on some of these issues though. One of the arguments used by this "primary" code reviewer was "...the maintenance burden of using vectors of chars does not justify its use over std::string::data() since both are guaranteed to work." This is absolutely ridiculous and I completely disagree with this statement. This is one of the reasons why I requested some help with some hard cold proof as to the incorrectness and flaw in this logic.
Thanks for your input!
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by sszd
One question though, is data() guarenteed to return a null terminated string, or is that also implementation dependent? If not, this could also potentially cause a core dump by the library call.
data() does not guarantee null termination; c_str() guarantees null termination.
Quote:
Originally Posted by sszd
Does anyone have a quote from the standard that indicates std::string is implementation dependent and is not guaranteed to be in contiguous memory (prior to the C++11 standard of course)?
I think that this is by omission, i.e., since there was no requirement for storage to be contiguous, storage need not be contiguous. Herb Sutter has a word on this: Cringe not: Vectors are guaranteed to be contiguous (his reply concerning std::string on 2008-04-08 at 12:28 pm).
With respect to his statement that "current ISO C++ does require &str[0] to cough up a pointer to contiguous string data", I note that there was a defect in the standard in that C++03 defined operator[] as returning data()[pos] (or rather "as if" it did so), but of course this cannot be so for the non-const overload, so it leaves open to interpretation just what that might mean if an implementor wanted to push the boundaries.
Quote:
Originally Posted by sszd
Does data() return a char const* to memory that is independent of the memory where the actual string is stored, or is this also implementation dependent?
Implementation defined.
EDIT:
Quote:
Originally Posted by sszd
One question though, under the notes section it says... "Writing to the character array accessed through data is undefined behavior." Does this apply even to the C++11 standard even though the memory is guaranteed to be contiguous?
Yes. C++11 did not change that.
Quote:
Originally Posted by Codeplug
anything that doesn't require const_cast is the only correct solution.
I don't agree: with a legacy interface that is not const-correct, the use of const_cast can be valid. In this case, it wouldn't be.
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by
sszd
Thanks for the link. One question though, under the notes section it says... "Writing to the character array accessed through data is undefined behavior." Does this apply even to the C++11 standard even though the memory is guaranteed to be contiguous?
You are confusing two different notions here:
c_str and data have ALWAYS guaranteed to return valid c-strings (ergo, contiguous in memory). The only problem is that is that the returned c-string is actually const, and you can't mutate it. This is still the case with C++11.
C++11, on the other hand, added the guarantee that the storage of the inner chars that compose that std::string are contiguous. This means that if you extract a pointer from a_string[0], then it makes sense to iterate on that string.
----
It would help to understand the whys of motivations of such a choice behind that choice.
Historically, there have been tons (tons!) of ways to implement strings. All have there ups and downs. Some use reference counting, some used copy on write, some used a deque like approach etc... One of the major aspects of c++ has always been "The user has the choice", so the standard said "implement as you like, we force nothing on you". So they did not enforce contiguous storage.
However, even if users had access to a beautiful c++ string class, C did not disappear, and having a way to extract a c-string from an std::string is a sine-qua-non requirement of std::string. Hence the c_str and data methods. Unfortunately, the returned string may or may not point to the actual inner working of the string. For example, a deque-based implementation could fill a temporary buffer to return on every call. Or, in the case of a copy-on-write string, the "internal" string could be potentially shared across different string objects. At this point mutating the returned c-str could create all sorts of trouble, so it was decided to ban that, and return a const char*.
----
C++11 introduced move semantics, which pretty removes the need for copy on write. Also, C++11 noticed that everyone was using contiguous memory anyways. Therefore, it was decided to enforce contiguous storage to allow the famous &string[0], or &*astring.begin(). In theory, under these new conditions, c_str and data have no reason to return const data, but changing it now could have broken existing code.
----
So there you have it.
Disclaimer: I made up the "history", but the rationale is pretty accurate, I think.
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by monarch_dodra
c_str and data have ALWAYS guaranteed to return valid c-strings (ergo, contiguous in memory).
data() does not provide that guarantee though since it does not guarantee null termination, otherwise it would be a wholly redundant part of the interface. (Wait, it is now a wholly redundant part of the interface, heheh.)
-
Re: std::string and contiguous memory?
[QUOTE=laserlight;2073599I don't agree: with a legacy interface that is not const-correct, the use of const_cast can be valid. In this case, it wouldn't be.[/QUOTE]
In one of our projects, we had a few C (and C++) APIs that did not have const correct input (eg, everything was a non-const pointer).
We wrapped all these APIs inside a new const-correct interface. The use of const cast is minimal, and well documented, and mimited to a single object. We had someone take time to validate EXACTLY which const_casts were legal, and which aren't. Inside the rest of our code, we use the const-correct interface with full confidence.
The problem (I find) with const_casting is it is like riding a car without a seat-belt. You take a risk every time, and every time you do it, you are less and less careful...
I've even seen once a function: "we promise not to modify your input", but inside the implementation: "We modify the input for performance, but we remember to change it back before the end"...
-
Re: std::string and contiguous memory?
>> One of the arguments... "...the maintenance burden of using vectors of chars does not justify its use ..."
If any reader of the code has to stop and ask themselves:
"hmm, const_cast<>, does that really not get modded?"
"hmm, is that really guaranteed to be contiguous?"
"hmm, this looks like a bug waiting to happen, what if..."
Then you've added readability and maintenance burden.
The interface is non-const. There is no guarantee that the interface won't modify the non-const buffer. Therefore there is no argument for using const_cast that could possibly hold water.
gg
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by
Codeplug
Therefore there is no argument for using const_cast that could possibly hold water.
well, given memory contiguity, there should be nothing wrong in doing something like
Code:
std::string s = ...; // s mutable
if( !s.empty() )
some_c_api_call( s.data() == &s[0] ? const_cast<char*>( s.data() ) : &std::vector<char>( s.begin(), s.end() )[0], s.size() );
this may prevent most std::vector conversions ...
-
Re: std::string and contiguous memory?
Hmm... but once you know that s.data() == &s[0], then there's no need to use the const_cast because you can use &s[0]. It wouldn't be wrong, but it also wouldn't be necessary.
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by
Codeplug
If any reader of the code has to stop and ask themselves:
"hmm, const_cast<>, does that really not get modded?"
"hmm, is that really guaranteed to be contiguous?"
"hmm, this looks like a bug waiting to happen, what if..."
Then you've added readability and maintenance burden.
This is a good argument. Thanks.
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by
superbonzo
well, given memory contiguity, there should be nothing wrong in doing something like
Code:
std::string s = ...; // s mutable
if( !s.empty() )
some_c_api_call( s.data() == &s[0] ? const_cast<char*>( s.data() ) : &std::vector<char>( s.begin(), s.end() )[0], s.size() );
this may prevent most std::vector conversions ...
Quote:
Originally Posted by
laserlight
Hmm... but once you know that s.data() == &s[0], then there's no need to use the const_cast because you can use &s[0]. It wouldn't be wrong, but it also wouldn't be necessary.
A particularly perverse implementation could decide to alias s[0] and *s.data(), but store the rest of the mutable data info somewhere else...
&s[0] would be wrong (as the memory would not be contiguous), and const_casting the data would also be wrong too.
Likely? No. But as long as we're on the subject of correctness...
-
Re: std::string and contiguous memory?
I'm still having difficulty with this conversation between this one code reviewer and myself. I think I'm going to simply stop trying to convince him and just move on.
The code review, as well as our entire conversation has taken place over a series of emails throughout the day. If you don't mind, I'd like to get some feedback from some of you out there with regard to how this conversation progressed. I've posted the email threads below so you can see exactly how this conversation went, and each of our trains of thought. I'm sure my side of the conversation could have been better, but I am intimated by this individual due to the fact that he is held in such high regard by management, and I certainly did not want to offend him, although I may have done just that.
My comments will be in blue, the code review's in green. Parts of the conversation that I stole from the internet (mostly quotes from various sites) will be in red. Oh, and I hope you don't mind, but I also integrated some statements from several of you out there into my own parts of the conversation. Many of you have been extremely helpful.
Reviewer: There is no need to use a std::vector<char> instead of a std::string. The underlying storage of a std::string is guaranteed to be contiguous just like a std::vector (and std::string's storage can be accessed directly by using the “data” method).
Me: Unless the C++ standard has changed (and we are using a fairly old compiler, relatively speaking), the last time I researched how strings are stored (albeit it's been many years), they are not guaranteed to be contiguous. It's completely up to the compiler designers on how this type of container is implemented, although most compilers are indeed implemented this way. Vectors, on the other hand, are guaranteed to be contiguous. With that said, it is extremely dangerous to pass a pointer to the internal memory of a string to a function that requires a pointer to a non-const string. That would of course require casting the constness away. I'm sure there's a very good reason that the data() method of string only returns pointer to const char. According to the standard, modifying the contents results in “undefined” behavior. Since the external C functions in the library being called in the code require non-const arguments, and not knowing whether or not those arguments are being modified, it's much safer to put the data in a buffer (in this case a non-const vector), and then pass that to the function.
Reviewer: I don't know the entire history of the definition of std::string, but its underlying memory has been guaranteed to be contiguous for many years. You are assuming that because you were able to get the underlying pointer of a vector without a const-cast that it is “defined” behavior, but as far as I know, dereferencing the first non-const element of a vector is not a “defined” interface any more than const-casting std::string::data(). Finally, even if it were, I would still suggest that the maintenance burden of using vectors of chars does not justify its use over std::string::data() since both are guaranteed to work.
Me: That was not an assumption. Getting the underlying address of a vector is well defined. And again, the underlying address of a string is “undefined behavior” if it is modified. It is not guaranteed to work.
Reviewer: You are mixing the concept of “undefined” with “guaranteed to work”. You are guaranteed that the pointer to memory is valid under defined conditions, so if you are just using the allocated memory as a buffer, it is guaranteed to work.
Me: Now I'm needing a sanity check. Here are some quotes I've found online.
from josuttis's book: “The C++ standard library does not state clearly whether the elements of a vector are requried to be contiguous memory. However, it is the intention that this is guaranteed and it will be fixed due to the defect report”
Reviewer: This information is obsolete; when was this book published? I would consider relying more on online sources for up-to-date information.
from Meyers book concerning using vector and string in legacy code: “The approach to getting a pointer to the beginning of the container data that works for vector isn't reliable for strings, because (1) the data for strings are not guaranteed to be stored in contiguous memory, and (2) the internal representation of a string is not guaranteed to end with a null character.”
Reviewer: More obsolete information. Note that I did not say this stuff has always been guaranteed by the C++ standard. It has been guaranteed in the present by the fact that all known implementations of std::string do guarantee these things, and the latest C++ standard that was approved last year guarantee that all future implementations will be compliant as well, so it is overly paranoid to assume otherwise.
Me: See final comments below.
Scott Meyer's has dedicated an entire chapter in his “Effective STL” book to usage of vector with legacy 'C' API's. He wouldn't have written it if he knew that vector was not guaranteed to be contiguous.
From Scott Meyer's “Effective STL” book: - a vector is guaranteed to be contiguous, and therefore it can be used in legacy C and C++ functions that require a pointer to a contiguous buffer.
Reviewer: I'm not claiming that what you did won't work, only that it's unnecessarily convoluted. I am concerned about the impact on maintenance and what junior engineers are learning.
Me: See final comments below.
Continuity is part of the vector abstraction. It's so important, in fact, that when it was discovered that the C++98 standard didn't completely guarantee continuity, the C++03 standard was amended to explicity add the guarantee.
Why is it so important that vectors be contiguous? Because that's what you need to guarantee that a vector is layout-compatible with a C array, and therefore we have no reason not to use vector as a superior and type-safe alternative to arrays even when we need to exchange data with C code. So vector is our gateway to other languages and most operating systems, whose lingua franca is the venerable C array.
Reviewer: That may have been true at the time he wrote it, but it is certainly not true now (std::array is more like a C array than a std::vector). In any case, he only stated that a vector is superior to a C array; there is no comment about whether a std::string is ever superior to a std::vector<char>.
From cppreference.com: - std::basic_string::data – const CharT* data() const; - Returns pointer to the underlying array serving as character storage.
Writing to the character array accessed through data is undefined behavior.
The pointer obtained from data() should not be considered valid after any non-const operation on the string.
Since C++11, data() and c_str() perform the same function.
from the C++11 standard clause 21.4.1 Paragraph 5: The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().
Me: You can see that the new standard does indeed guarantee std::string to be contiguous, however, we are not using a C++11 compliant compiler. So until that happens, I still (if for nothing more than safety) perfer to use a vector in cases like this.
Reviewer: As noted above, there is no safety benefit because all known and future implementations of std::string are contiguous, but there is a maintenance burden in using std::vector<char> here. As a side issue, note that the new standard also adds the “data” method to vector and other containers, as calling that method for the underlying pointer is less obfuscating than the &foo[0] construct. For some reason, only the std::string doesn't have a non-const version of data added, perhaps over concern over backwards compatibilty. I wouldn't be bothered if you perferred the &foo[0] construct over data() for strings.
Me: Well, it looks like we're not going to convince one another. But I would just like to make a few last comments.
You've talked about following good coding practices in other reviews. Well, a good rule of thumb is, respect the interface. Make your code unquestionably correct. If anything, casting the constness away in order to violate an interface is not (in my opinion) good coding standards. Also, by doing so you actually add to the burden of maintenance, because now some programmer coming along (junior or otherwise) will look at that and say...
“...const_cast<>, does that really not get modified?”
“...does that really return a null terminated pointer to char?”
“...this looks like a bug waiting to happen, what if...”
etc...
By the way, the standard does not guarantee that data() return a null terminated pointer to char. Just because all known current implementations guarantee that for there own compiler is not a guarantee for all other and future compilers in existence. It's a guarantee if the standard says its a guarantee. And even the newest standard doesn't guarantee that. But it does guarantee contiguous memory for the internal layout. It also says that modifying the buffer that is returned by data() is still undefined behavior.
So that's it so far. The day was late and I left work.
I would like to say I have no intention on changing the code that I have written based on our conversation, unless he goes to management and tries to force me to see things his way. They do look at him in such high regard that I may have no choice. In that case, you can bet I most certainly will put in comments indicating how I feel these changes are inappropriate and flawed.
Please feel free to rip both sides of this conversation apart. If there are glaring errors (by either side), please comment on them. Thanks.
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by sszd
It has been guaranteed in the present by the fact that all known implementations of std::string do guarantee these things, and the latest C++ standard that was approved last year guarantee that all future implementations will be compliant as well, so it is overly paranoid to assume otherwise.
I think he's right here. You're not writing code to be that may be compiled with an older standard library implementation that may be unknown to you, so just check that what you have satisfies the assumption (it probably does). If so, all is well: go ahead and use &s[0] even if the standard library implementation was written at a time when the current guarantees did not exist.
-
Re: std::string and contiguous memory?
Quote:
Reviewer: I'm not claiming that what you did won't work, only that it's unnecessarily convoluted. I am concerned about the impact on maintenance and what junior engineers are learning.
Why does the reviewer think the other programmers are ignorant or can't simply look up this information? What the reviewer is stating sounds like projection -- he/she has a problem with it, so it must mean that others will have a problem with it. Unless you're using convoluted and obfuscated C coding in a C++ program, a reviewer should (or would) never say this, since IMO this is an indictment of the other programmers.
If these programmers don't know that vector is contiguous, then what other aspect of the standard library do they not know? Are you allowed to use vectors as a replacement for new[]/delete[]? How about usage of algorithms? Function objects? Might as well have the reviewer list them, so you don't waste your time writing code that uses these idioms.
Regards,
Paul McKenzie
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by
laserlight
I think he's right here. You're not writing code to be that may be compiled with an older standard library implementation that may be unknown to you, so just check that what you have satisfies the assumption (it probably does). If so, all is well: go ahead and use &s[0] even if the standard library implementation was written at a time when the current guarantees did not exist.
Unfortunately, my real world experience kind of makes me feel the same way. Yes, it is wrong, but let's face it, it works, and it costs less.
What I suggest you do though, rather than doing an ugly const_cast, or an undefined &str[0]/&*str.begin(), is to use a non-member "data" function. C++11 defined non-member functions such as begin/end that could be used on either containers or with static arrays. Why not just do the same thing with a data function? Here is what I have in mind:
Code:
#include <vector>
#include <iostream>
#include <string>
#include <array>
#include "data.h"
int main() {
std::vector<int> vec1(10);
const std::vector<int> vec2(10);
char* p1 = "blablabla";
const char* p2 = "blablabla";
char a1[10] = "blablabla";
const char a2[10] = "blablabla";
std::string s1 = "blablabla";
const std::string s2 = "blablabla";
std::array<int, 3> ar1 = {1, 2, 3};
const std::array<int, 3> ar2 = {1, 2, 3};
std::cout << *data(vec1) << std::endl;
std::cout << *data(vec2) << std::endl;
std::cout << *data(p1) << std::endl;
std::cout << *data(p2) << std::endl;
std::cout << *data(a1) << std::endl;
std::cout << *data(a2) << std::endl;
std::cout << *data(s1) << std::endl;
std::cout << *data(s2) << std::endl;
std::cout << *data(ar1) << std::endl;
std::cout << *data(ar2) << std::endl;
}
With this approach, you have access to a generic/portable "data" method, that works for any compatible container-like object. The advantage is the developper doesn't have to worry since the public interface of "data" is: "I guarantee I work. Don't worry about it".
To implement this, you have two options:
The first one, the easy one:
Code:
//Generic container
template<typename T>
typename T::pointer data(T& input)
{
return const_cast<typename T::pointer>(input.data());
}
//Generic const_container
template<typename T>
typename T::const_pointer data(const T& input)
{
return input.data();
}
//Generic (const) pointer/array
template<typename T>
T* data(T* input)
{return input;}
By adding that const_cast, it makes sure the call works for either generic containers (vector/array), as well as strings classes. Yay!
I'd recommend this actually doing this though:
Code:
//Generic container
template<typename T>
typename T::pointer data(T& input)
{
return input.data();
}
//Generic const_container
template<typename T>
typename T::const_pointer data(const T& input)
{
return input.data();
}
//basic_string forward
namespace std{
template<typename C, typename T, typename A>
class basic_string;
}
//data for basic_string
template<typename C, typename T, typename A>
typename std::basic_string<C, T, A>::pointer data(std::basic_string<C, T, A>& input)
{
return const_cast<typename std::basic_string<C, T, A>::pointer>(input.data());
}
//data for const basic_string
template<typename C, typename T, typename A>
typename std::basic_string<C, T, A>::const_pointer data(const std::basic_string<C, T, A>& input)
{
return input.data();
}
//Generic (const) pointer/array
template<typename T>
T* data(T* input)
{return input;}
This is a bit more convoluted, but it separates the implementation for string_types. This can be useful if you want to track down all data calls to [|s|s|u13|u32]string's
EDIT: I just remembered you aren't using C++11, so you can replace the container's ".data" with ".empty? null, &[0];"
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by monarch_dodra
With this approach, you have access to a generic/portable "data" method, that works for any compatible container-like object. The advantage is the developper doesn't have to worry since the public interface of "data" is: "I guarantee I work. Don't worry about it".
Nice, but...
Quote:
Originally Posted by monarch_dodra
EDIT: I just remembered you aren't using C++11, so you can replace the container's ".data" with ".empty? null, &[0];"
There are containers for which &x[0] is valid, but which do not store the elements of x contiguously, e.g., std::deque (which does not have a data() member function in C++11). In such a case, this workaround would not result in a compile error, i.e., this non-member data() would not actually guarantee that it works.
-
Re: std::string and contiguous memory?
Why does he think using std::vector is liable to be a maintenance problem?
It's hardly a lot of extra code to write.
Code:
// Unsafe way
std::string text("Some text");
...
Function(const_cast<char*>(text.c_str()));
// Safe way
// Safe C string generator.
class SafeCString
{
public:
SafeCString(const std::string& text)
{
data.assign(text.c_str(), text.c_str() + text.size() + 1);
}
char* operator()()
{
return &data[0];
}
private:
std::vector<char> data;
};
std::string text("Some text");
...
Function(SafeCString(text)());
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by
sszd
That’s why I used a vector. My co-worker is very persistent though, so I just needed some proof.
I think you're both wrong. There's no reason to assume anything about the internals of vector or string really. Use them at the level of abstraction they were designed for. Don't break encapsulation in ordinary code just because you can.
So create an ordinary char array, copy the string content to it and pass it to the function? That's the proper C solution to a C problem and it's guaranteed to be correct.
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by nuzzle
I think you're both wrong. There's no reason to assume anything about the internals of vector or string really. Use them at the level of abstraction they were designed for. Don't break encapsulation in ordinary code just because you can.
I linked to Herb Sutter's blog post on this earlier, admittedly for a different reason, but the blog post itself was about whether using &v[0] for a std::vector v violates the abstraction: Cringe not: Vectors are guaranteed to be contiguous. He concludes that by design, it doesn't.
EDIT:
Consequently, the same idea applies for the guarantee of contiguous storage of the contents of basic_string in C++11.
Quote:
Originally Posted by nuzzle
So create an ordinary char array, copy the string content to it and pass it to the function? That's the proper C solution to a C problem and it's guaranteed to be correct.
You will then have to do manual memory management. Not the end of the world, but why do that when there's std::vector, or post-C++11 (or with suitable knowledge that it is safe) std::string?
-
Re: std::string and contiguous memory?
Quote:
One question though, is data() guaranteed to return a null terminated string, or is that also implementation dependent
Not in the old standard ... in C+11 data() is guaranteed to return a null terminated string.
c_str() is guaranteed in both versions.
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by
laserlight
Nice, but...
There are containers for which &x[0] is valid, but which do not store the elements of x contiguously, e.g., std::deque (which does not have a data() member function in C++11). In such a case, this workaround would not result in a compile error, i.e., this non-member data() would not actually guarantee that it works.
Crap :D
There goes my beautiful design :D
-
Re: std::string and contiguous memory?
-
Re: std::string and contiguous memory?
>> I would like to say I have no intention on changing the code that I have written based on our conversation, unless he goes to management and tries to force me to see things his way
Be mindful of the "office politics" in play here. You don't want to set an unfavorable perception of yourself in other's eyes.
>> The interface is non-const. There is no guarantee that the interface won't modify the non-const buffer. Therefore there is no argument for using const_cast that could possibly hold water.
There is no debate outside of that. Using const_cast<> in this case is just garbage. Once that fact is accepted, then move on to "how to do it right". If that fact won't be accepted, move on. Not much you can do besides laying down that facts (what the standard says), and the opinions of yourself and other experts in the field like Scott Meyers and Herb Sutter.
gg
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by
laserlight
I linked to Herb Sutter's blog post on this earlier, admittedly for a different reason, but the blog post itself was about whether using &v[0] for a std::vector v violates the abstraction:
Herb Sutter makes it a special point that he's in disagreement with Andy Koenig so I guess I side with the latter.
My point is that you have a choise. You don't have to use every aspect of an interface. Just because it's there doesn't mean you have to use it. It's better to stay away from things that are shaky, questionable or just downright ugly in general.
As of C++ 11 vector and string finally have certain firm guarantees for how memory is handled internally. But to me that's a big yawn because I'm never going to make use of this anyway. Not even if Herb Sutter claims it's a proper part of a vector or string abstraction. I, and possibly Andy Koenig, still don't think it is. Especially not if the only motivation is that it belongs there because it is there. And note that the OP is using pre-version 11 C++ where the situation isn't that clear-cut.
In my view the OP should suggest an adapter class to convert a string into a suitable char array in a way that doesn't make any assumptions on the internals of string or vector at all. That will undisputably work. And this should be presented as a first step towards a more thorough interface to that API library that's used.
That's the constructive thing to do. Getting out of this silly conflict where both sides are wrong and start improving the program together. I'm not against deep discussions about technical detail but sometimes they become all trees that prevent you from seeing the wood.
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by
JohnW@Wessex
Why does he think using std::vector is liable to be a maintenance problem?
It's hardly a lot of extra code to write.
Truth. however, it lacks the "other" class, SafeMutableCString, if the point is to mutate the string:
Code:
struct SafeMutableCString
{
public:
explicit SafeMutableCString(std::string& str) //NON CONST THIS TIME
: m_str(str)
{
data.assign(m_str.c_str(), m_str.c_str() + str.size() + 1);
}
operator char*() //I changed this from char* operator()
{
return &data[0];
}
~SafeMutableCString() //Destructor writes back to string
{
m_str = &data[0];
}
private:
std::string& m_str;
std::vector<char> data;
};
BTW, I think the design is better with an explicit destructor, but implicit cast. The usage becomes more transparent, and I don't think there is any potential for wrong usage (since the constructor is explicit, and the ONLY point in creating the struct is to cast it anyways) :
Code:
void foo(char * p)
{
strcpy(p, "hello world!");
}
int main()
{
std::string text("Some text containing lots of letters");
foo(SafeMutableCString(text));
std::cout << text << std::endl;
}
Again, I think the strength in such a design is not the internals, but the public interface. In the long run, who cares how SafeMutableCString is written, as long as the coder can use it without worry. If your reviewer wants to replace the implementation with const_cast for efficiency reasons, then power to him. You, on the other hand, are using a method with an interface certified safe.
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by nuzzle
You don't have to use every aspect of an interface. Just because it's there doesn't mean you have to use it. It's better to stay away from things that are shaky, questionable or just downright ugly in general.
I agree, but with respect to C++11, the &s[0] solution for std::string s is not shaky, questionable or downright ugly (a little ugly?). With respect to C++03, the &v[0] solution for an intermediate std::vector v is not shaky or questionable, but I concede that it is downright ugly compared to your suggestion of an adapter class... but then it could be used to implement the adapter class (like what JohnW@Wessex and monarch_dodra have been posting about recently).
Quote:
Originally Posted by nuzzle
Especially not if the only motivation is that it belongs there because it is there.
Sutter's assertion is that it is there because it belongs there and considering that the lack thereof was considered a defect in C++98, that makes sense to me. If it had been in C++98 rather than amended in C++03, then your argument might hold water.
Quote:
Originally Posted by nuzzle
In my view the OP should suggest an adapter class to convert a string into a suitable char array in a way that doesn't make any assumptions on the internals of string or vector at all. That will undisputably work.
If there is a guarantee, then there is no assumption. It will undisputably work.
Where the assumption comes in would be if sszd goes with the &s[0] solution for a std::string s, with the knowledge that the standard library implementation does not conform to C++11, but happens to use contiguous storage. It may be a practical solution, but it is still an assumption, not a guarantee.
-
Re: std::string and contiguous memory?
Quote:
Originally Posted by
monarch_dodra
BTW, I think the design is better with an explicit destructor,
I agree. That was a mistake on my part; I normally always make single parameter constructors explicit.