|
-
June 27th, 2012, 09:44 PM
#1
std::string and contiguous memory?
I need some help arguing a point. Here's the situation. I recently wrote some code that required making several calls into a COTS API library where the function calls required one of the arguments to be a non-const pointer to char. I learned a long time ago that you should never mess the internal memory of a std::string. With that in mind, whenever I've run across situations like this I've always copied a std::string into a non-const vector of chars (std::vector<char>). I then pass the address of the first element into the function as such (&v[0]). In using the vector as a buffer like this, I don't care whether the function modifies the argument or not.
In a recent code review, the so-called resident “expert” on our program indicated that there was no reason to use a vector. I should use the data() method of the std::string to obtain an internal pointer to the string's memory and pass that into the function since the returned memory of this call is guaranteed to be contiguous. Of course, that would require casting the constness away from the pointer. I argued that the implementation of a std::string is at the discretion of the compiler designers. Because of this, the internal memory of a std::string in not guaranteed to be contiguous, although, most probably do implement it that way. I went on to say that messing with its contents has undefined behavior. A vector on the other hand is guaranteed to have contiguous memory. My first question is... am I off my rocker, or are these statements correct?
He then went on to say that I was “assuming” that the implementation of taking the address of the first element of a vector is well defined. Well, isn't it?
The gist of all this is that I am either completely out to lunch, or I need some hard evidence to back my claims. What I was wondering is if someone has access to the C++ standards (pre-C++11), could you provide some quoted statements from the document to help prove my point (assuming, of course, I'm not delirious). Any comments, statements, links, or just some simple quotes from reliable sources would be greatly appreciated. Thanks for your help.
-
June 27th, 2012, 09:56 PM
#2
Re: std::string and contiguous memory?
 Originally Posted by sszd
I argued that the implementation of a std::string is at the discretion of the compiler designers. Because of this, the internal memory of a std::string in not guaranteed to be contiguous, although, most probably do implement it that way. I went on to say that messing with its contents has undefined behavior.
You were right, however, since the 2011 edition of the C++ standard, the internal storage of the contents of a std::string is guaranteed to be contiguous:
 Originally Posted by C++11 Clause 21.4.1 Paragraph 5
The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().
That said, instead of calling data() and then casting away const-ness, I think that it would be better to use the &str[0] idiom (like how it is for std::vector), after checking that the string is not empty.
 Originally Posted by sszd
He then went on to say that I was “assuming” that the implementation of taking the address of the first element of a vector is well defined. Well, isn't it?
Yes, that has been guaranteed since the 2003 edition of the C++ standard.
Last edited by laserlight; June 27th, 2012 at 09:58 PM.
-
June 27th, 2012, 10:31 PM
#3
Re: std::string and contiguous memory?
Thank you for the reply. However, I was actually looking for quotes from the standards previous to C++11 since we are using older gnu compilers and have no intention on upgrading any time soon. Since that's the case, would you still recommend using &s[0]?
-
June 27th, 2012, 10:47 PM
#4
Re: std::string and contiguous memory?
 Originally Posted by sszd
However, I was actually looking for quotes from the standards previous to C++11 since we are using older gnu compilers and have no intention on upgrading any time soon. Since that's the case, would you still recommend using &s[0]?
Since you are compiling using a known set of compilers, you could just check if the given standard library implementations store the contents of std::string contiguously. If one of them doesn't, or if you cannot determine this, then I would not recommend using &s[0] as it is better to be safe than sorry.
Another thing to consider:
 Originally Posted by sszd
COTS API library where the function calls required one of the arguments to be a non-const pointer to char.
If this is due to a legacy interface that is not const-correct, and it is documented that the contents of the array is not modified through that pointer to non-const char, then using data() and casting away const-ness is fine.
-
June 28th, 2012, 02:34 AM
#5
Re: std::string and contiguous memory?
Does it really matter knowing if the string stores its internal contents contiguously? Last I checked, "data" and "c_str" are guaranteed to return valid (const) c-strings anyways, regardless of the version.
If you use these, then you are 100% safe. You can't mutate though...
 Originally Posted by laserlight
Yes, that has been guaranteed since the 2003 edition of the C++ standard.
Really? I would have thought it be guaranteed since day 0.
Is your question related to IO?
Read this C++ FAQ article at parashift by Marshall Cline. In particular points 1-6.
It will explain how to correctly deal with IO, how to validate input, and why you shouldn't count on "while(!in.eof())". And it always makes for excellent reading.
-
June 28th, 2012, 02:51 AM
#6
Re: std::string and contiguous memory?
 Originally Posted by monarch_dodra
You can't mutate though...
Which is the reason for this thread: the pointer being passed is a pointer to non-const char (though I note that sszd wrote "non-const pointer to char", but I doubt he/she meant "pointer, that is const, to non-const char" ).
 Originally Posted by monarch_dodra
Really? I would have thought it be guaranteed since day 0.
I believe it was a defect in the original version, i.e., they forgot to require it. That said, I would bet no serious standard library implementation ever had std::vector store its contents in a non-contiguous fashion.
-
June 28th, 2012, 02:56 AM
#7
Re: std::string and contiguous memory?
... and if you need a mutable char array, you could check memory contiguity at runtime, eventually in debug mode, or, in release, eventually falling back to std::vector<char>. Say, something like
Code:
bool is_memory_contiguous( const std::string& s )
{
return std::adjacent_find( s.cbegin(), s.cend(), []( const char& l, const char& r ) { return &l + 1 != &r; } ) == s.cend();
}
// used as
std::string s = ...;
_ASSERT( is_memory_contiguous( s ) );
if( !s.empty() )
some_c_api_call( &s[0], s.size() );
// or
std::string s = ...;
if( !s.empty() )
some_c_api_call( is_memory_contiguous( s ) ? &s[0] : &std::vector<char>( s.begin(), s.end() )[0], s.size() );
Last edited by superbonzo; June 28th, 2012 at 11:32 AM.
Reason: minor modification to code snippet
-
June 28th, 2012, 03:29 AM
#8
Re: std::string and contiguous memory?
 Originally Posted by sszd
In a recent code review, the so-called resident “expert” on our program indicated that there was no reason to use a vector. I should use the data() method of the std::string to obtain an internal pointer to the string's memory and pass that into the function since the returned memory of this call is guaranteed to be contiguous. Of course, that would require casting the constness away from the pointer.
The potential error is in the const-cast. If the memory pointed to by the (const-casted) pointer is not modified by the function, all is fine. But if it is modified, you could be looking at undefined behavior. The standard states that modifying the contents of a const object is undefined behavior. In principle, the string implementation could use a const object under the hood, e.g. for empty strings.
Cheers, D Drmmr
Please put [code][/code] tags around your code to preserve indentation and make it more readable.
As long as man ascribes to himself what is merely a posibility, he will not work for the attainment of it. - P. D. Ouspensky
-
June 28th, 2012, 04:01 AM
#9
Re: std::string and contiguous memory?
 Originally Posted by laserlight
Which is the reason for this thread: the pointer being passed is a pointer to non-const char (though I note that sszd wrote "non-const pointer to char", but I doubt he/she meant "pointer, that is const, to non-const char"  ).
Well, he did say the method being passed to guaranteed no mutation would occur, so followed up with a const_cast should be fine.
Is your question related to IO?
Read this C++ FAQ article at parashift by Marshall Cline. In particular points 1-6.
It will explain how to correctly deal with IO, how to validate input, and why you shouldn't count on "while(!in.eof())". And it always makes for excellent reading.
-
June 28th, 2012, 04:09 AM
#10
Re: std::string and contiguous memory?
 Originally Posted by monarch_dodra
Well, he did say the method being passed to guaranteed no mutation would occur, so followed up with a const_cast should be fine.
Hmm... could you quote that part? I can't seem to find it among what sszd wrote, and I looked over both posts carefully. Twice
-
June 28th, 2012, 04:21 AM
#11
Re: std::string and contiguous memory?
 Originally Posted by sszd
In a recent code review, the so-called resident “expert” on our program indicated that there was no reason to use a vector. I should use the data() method of the std::string to obtain an internal pointer to the string's memory and pass that into the function since the returned memory of this call is guaranteed to be contiguous.
OK, but I don't understand this:
Of course, that would require casting the constness away from the pointer.
Why is this necessary? What's the signature of the called function? If it is not const char*, then why isn't it const char*?
I went on to say that messing with its contents has undefined behavior.
As of pre-2011 ANSI C++, yes, modifying the buffer that is returned by data() is undefined behavior.
A vector on the other hand is guaranteed to have contiguous memory. My first question is... am I off my rocker, or are these statements correct?
A vector is guaranteed to be contiguous -- this is stated in the ANSI/ISO specification.
Also, you can show him what one well-respected "resident expert", Scott Meyers, says in one of his books (I think it's Effective STL) -- a vector is guaranteed to be contiguous, and therefore it can be used in legacy C and C++ functions that require a pointer to a contiguous buffer. So who are you to believe, Scott Meyers or your code reviewer?
Regards,
Paul McKenzie
-
June 28th, 2012, 04:34 AM
#12
Re: std::string and contiguous memory?
 Originally Posted by Paul McKenzie
As of pre-2011 ANSI C++, yes, modifying the buffer that is returned by data() is undefined behavior.
It will still result in undefined behaviour. It is the &s[0] version that you're thinking of.
-
June 28th, 2012, 05:10 AM
#13
Re: std::string and contiguous memory?
 Originally Posted by sszd
Thank you for the reply. However, I was actually looking for quotes from the standards previous to C++11 since we are using older gnu compilers and have no intention on upgrading any time soon. Since that's the case, would you still recommend using &s[0]?
From the 2003 standard:
23.2.4 Class template vector
[1] A vector is a kind of sequence that supports random access iterators. In addition, it supports (amortized)
constant time insert and erase operations at the end; insert and erase in the middle take linear time. Storage
management is handled automatically, though hints can be given to improve efficiency. The elements of a
vector are stored contiguously, meaning that if v is a vector<T, Allocator> where T is some type
other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size().
-
June 28th, 2012, 06:16 AM
#14
Re: std::string and contiguous memory?
 Originally Posted by laserlight
Hmm... could you quote that part? I can't seem to find it among what sszd wrote, and I looked over both posts carefully. Twice 
My Bad, I was reading
laserlight's post.
Is your question related to IO?
Read this C++ FAQ article at parashift by Marshall Cline. In particular points 1-6.
It will explain how to correctly deal with IO, how to validate input, and why you shouldn't count on "while(!in.eof())". And it always makes for excellent reading.
-
June 28th, 2012, 08:12 AM
#15
Re: std::string and contiguous memory?
 Originally Posted by laserlight
Since you are compiling using a known set of compilers, you could just check if the given standard library implementations store the contents of std::string contiguously. If one of them doesn't, or if you cannot determine this, then I would not recommend using &s[0] as it is better to be safe than sorry.
Well, unless we were using a compiler that supports C++11, I am reluctant to use that construct regardless. And yes, I agree it's better to be safe than sorry. That's why I moved the contents into a vector in the first place.
 Originally Posted by laserlight
If this is due to a legacy interface that is not const-correct, and it is documented that the contents of the array is not modified through that pointer to non-const char, then using data() and casting away const-ness is fine.
It is not documented anywhere in the library that the contents are or are not modified. And again, it's better to be safe than sorry. One question though, is data() guarenteed to return a null terminated string, or is that also implementation dependent? If not, this could also potentially cause a core dump by the library call.
 Originally Posted by D_Drmmr
If the memory pointed to by the (const-casted) pointer is not modified by the function, all is fine.
Yes, but we don't know that. And all would be fine only if data() returned a null terminated string.
 Originally Posted by Paul McKenzie
What's the signature of the called function? If it is not const char*, then why isn't it const char*?
Actually I mentioned that the signature of the COTS library requires a non-const pointer to char. What I should have said is a pointer to a non-const char, or more appropriately, a non-const pointer to a non-const char (i.e. char*), sorry. Anyway, I have no idea why the library calls were written that way. It is an extremely old COTS product that we are simply having to deal with. Oh, and thanks for the Scott Meyers quote!
 Originally Posted by Philip Nicoletti
From the 2003 standard:
Thanks for the quote.
==========================================================
So, after all of this, I have a few more questions.
Does anyone have a quote from the standard that indicates std::string is implementation dependent and is not guaranteed to be in contiguous memory (prior to the C++11 standard of course)?
Does data() return a char const* to memory that is independent of the memory where the actual string is stored, or is this also implementation dependent?
Is the location of the data that data() returns guaranteed to be null terminated?
The answers to these would indicate to me whether or not it’s safe to cast the const-ness away from the return from data(), regardless of whether or not the function is modifying the contents. But I usually steer away from dangerous (IMHO) stuff like this anyway and simply try to play it safe. That’s why I used a vector. My co-worker is very persistent though, so I just needed some proof.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|