std::string to std::vector<TCHAR> with terminator
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 1 of 2 12 LastLast
Results 1 to 15 of 19

Thread: std::string to std::vector<TCHAR> with terminator

  1. #1
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,100

    std::string to std::vector<TCHAR> with terminator

    Been wondering about this one...

    I need to copy a std::string into a vector<TCHAR> (many times) and I'm looking at the most efficient way to do so.

    Code:
    vec.assign( str.begin(), str.end() );
    is insufficient as that doesn't guarantee the zero terminator.

    Code:
    vec.assign( str.begin(), str.end() );
    vec.push_back( _T('\0') );
    works but can cause a reallocation (=bad).

    Code:
    vec.reserve( str.size()+1 );
    vec.assign( str.begin(), str.end() );
    vec.push_back( _T('\0') );
    solves the reallocation, but adds quite a bit of unneeded bulk.
    there's an unnecessary size check in assign() as well as in the push_back now.

    is this the only thing I can do, or is there a better solution that does: reserve size+1, copy size bytes from string to vector, set terminator. without any excess ?

  2. #2
    Join Date
    Apr 1999
    Posts
    27,446

    Re: std::string to std::vector<TCHAR> with terminator

    Quote Originally Posted by OReubens View Post
    Been wondering about this one...

    I need to copy a std::string into a vector<TCHAR> (many times) and I'm looking at the most efficient way to do so.
    Note that you should profile your code using a profiler and not assume that some method is slow or fast "by eyesight". C++ is a language where many things cannot be judged for speed by just looking at the code.

    Here is another method:
    Code:
    vec.reserve( str.size()+1, 0 );
    std::copy(str.begin(), str.end(), vec.begin());
    Regards,

    Paul McKenzie

  3. #3
    VictorN's Avatar
    VictorN is offline Super Moderator Power Poster
    Join Date
    Jan 2003
    Location
    Wallisellen (ZH), Switzerland
    Posts
    17,611

    Re: std::string to std::vector<TCHAR> with terminator

    Also note that there may be needed the conversion from char* to TCHAR* in UNICODE build.
    Victor Nijegorodov

  4. #4
    Join Date
    Oct 2008
    Posts
    1,168

    Re: std::string to std::vector<TCHAR> with terminator

    Quote Originally Posted by OReubens View Post
    is this the only thing I can do, or is there a better solution that does: reserve size+1, copy size bytes from string to vector, set terminator. without any excess ?
    you can use a specialized allocator replacing default construction with a do-nothing operation and hope that the compiler is smart enough to avoid any loop while resizing the vector. If this is the case, you can use a resize() + copy() + back()='\0'.

    BTW, you could also insert a null terminator in the original std::string, but I suppose you already thought about that

  5. #5
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,100

    Re: std::string to std::vector<TCHAR> with terminator

    Quote Originally Posted by Paul McKenzie View Post
    Note that you should profile your code using a profiler and not assume that some method is slow or fast "by eyesight". C++ is a language where many things cannot be judged for speed by just looking at the code.
    I am aware of this.

    Quote Originally Posted by Paul McKenzie View Post
    Here is another method:
    Code:
    vec.reserve( str.size()+1, 0 );
    std::copy(str.begin(), str.end(), vec.begin());
    There doesn't appear to be a vector::reserve() with 2 parameters ?
    Did you intend the constructor with a size+filler char ? Or vector::resize( size, filler) ?

    in any case... The constructor approach wouldn't be usable in this case, since the vector is a member of a class, and the place this used is in a setter-member function.

    The resize with filler, has the overhead of first filling the entire vector buffer with 0, then overwriting all but the last TCHAR with the contents of the string. A fairly costly operation on large strings.

    Unfortunately resize without a filler also does an explicit zero-fill.

    reserve+copy doesn't work because reserve doesn't change the size of the vector, and the subsequent copy errors (in debug)


    Quote Originally Posted by VictorN View Post
    Also note that there may be needed the conversion from char* to TCHAR* in UNICODE build.
    Not an issue in this case, the unicode build is set up to use a std::stringw so it remains a straightforward copy.


    Quote Originally Posted by superbonzo View Post
    you can use a specialized allocator replacing default construction with a do-nothing operation and hope that the compiler is smart enough to avoid any loop while resizing the vector. If this is the case, you can use a resize() + copy() + back()='\0'.

    BTW, you could also insert a null terminator in the original std::string, but I suppose you already thought about that
    that custom allocator idea seems very flaky :s

    changing the passed std:string is a bad idea. I can't go around changing the caller's data. All of those strings are const for a reason. And it has the nasty side effect that adding the extra terminator might need a reallocation as well (which is what I'm trying to avoid).



    still no acceptable vector-based solution
    I'm currently only seeing a way out in changing vector<TCHAR> to unique_ptr<TCHAR> and do all buffer management myself I'll need to file a 'breach of specs' form for this, which is going to be a hassle... sigh... let along changing/testing all the downlevel code to deal with that change.

  6. #6
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,100

    Re: std::string to std::vector<TCHAR> with terminator

    also the resize+copy doesn't work if the vector is already larger, in that case, it wouldn't be zero filled, and have no terminator at the end of the string, you'd still need to force one yourself. so
    Code:
    vec.resize(str.size()+1);
    std::copy(str.begin(), str.end(), vec.begin());
    vec[str.size()]=0;
    works, but has the excess filler on size increases

  7. #7
    Join Date
    Nov 2003
    Posts
    1,825

    Re: std::string to std::vector<TCHAR> with terminator

    >> Not an issue in this case, the unicode build is set up to use a std::stringw so it remains a straightforward copy.
    I'm confused. So the source of the copy is always "std::string", and the ANSI build uses a destination type of "std::vector<TCHAR>", but the Unicode build uses a destination type of "std::wstring"?
    Unicode: std::string --> std::wstring
    ANSI: std::string --> std::vector<TCHAR>

    Is that what you're dealing with?

    gg

  8. #8
    Join Date
    Oct 2008
    Posts
    1,168

    Re: std::string to std::vector<TCHAR> with terminator

    Quote Originally Posted by OReubens View Post
    that custom allocator idea seems very flaky :s
    anyway, it's reasonable to assume that the compiler will cancel out a do-nothing loop and such an allocator behavior is legal and not immoral from a semantics pov. Actually, the only problem I see now is that older compilers ( those not supporting the new c++11 allocator spec ) use the copy constructor during vector::resize() instead of the default ctor.

    Anyway, here is a small test to see if vc2010 effectively optimizes such a scenario or not:

    Code:
    #include <algorithm>
    #include <vector>
    #include <iostream>
    #include <ctime>
    
    int main()
    {
    	std::vector<char> vec;
    	auto start = std::clock();
    
    	vec.resize( 10000000 );
    
    	std::cout << std::clock() - start << std::endl;
    }
    now, replace the vector value type with

    Code:
    struct char_ { char_(){} char_(char_ const& ){} char _; };
    the above, although not technically a valid vector value type, emulates an allocator implementing a do-nothing default-construction( note that vc2010 does not support the new allocator specification ).
    In my system, char measures 6-8 ms whilst char_ measures 0, showing that the optimization effectively takes place.

  9. #9
    Join Date
    Nov 2003
    Posts
    1,825

    Re: std::string to std::vector<TCHAR> with terminator

    I'm partial to reserve + assign + c_str:
    Code:
        const string s = "123abc";
        vector<TCHAR> v;
    
        v.reserve(s.length() + 1);
        v.assign(s.c_str(), s.c_str() + s.length() + 1);
    Assuming that most std::string implementations use a C-string as the internal representation - making c_str() calls nearly no-cost.

    >> I'm confused.
    >> Is that what you're dealing with?
    Or is it this:
    Unicode: std::wstring --> std::vector<wchar_t>
    ANSI: std::string --> std::vector<char>
    ?

    Out of curiosity, does this code actually have production builds for both ANSI and Unicode?
    If so, for what purpose - legacy OS's, or to support legacy app's that consume both targets?
    If not, does the opposite of the production target even compile?

    gg

  10. #10
    Join Date
    Apr 1999
    Posts
    27,446

    Re: std::string to std::vector<TCHAR> with terminator

    Quote Originally Posted by OReubens View Post
    I am aware of this.


    There doesn't appear to be a vector::reserve() with 2 parameters ?
    Did you intend the constructor with a size+filler char ? Or vector::resize( size, filler) ?
    Yes, I meant resize().

    Regards,

    Paul McKenzie

  11. #11
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,100

    Re: std::string to std::vector<TCHAR> with terminator

    Quote Originally Posted by Codeplug View Post
    >> Not an issue in this case, the unicode build is set up to use a std::stringw so it remains a straightforward copy.
    I'm confused. So the source of the copy is always "std::string", and the ANSI build uses a destination type of "std::vector<TCHAR>", but the Unicode build uses a destination type of "std::wstring"?
    Unicode: std::string --> std::wstring
    ANSI: std::string --> std::vector<TCHAR>

    Is that what you're dealing with?

    gg
    no, the code uses TCHAR, and uses string in ansi and stringw in unicode. Should have simplified the question with char and string rather than mention TCHAR, that's just what the code was like when I copied it.

    So the source (string) of the copy always copies to a matching type vector.
    ansi: std::string -> std::vector<TCHAR>
    unicode std::stringw -> std::vector<TCHAR>

    the main reason here is legacy API's that require WRITE access to the string buffer, which std::string doesn't allow, hence the copying.

  12. #12
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,100

    Re: std::string to std::vector<TCHAR> with terminator

    Quote Originally Posted by superbonzo View Post
    anyway, it's reasonable to assume that the compiler will cancel out a do-nothing loop and such an allocator behavior is legal and not immoral from a semantics pov. Actually, the only problem I see now is that older compilers ( those not supporting the new c++11 allocator spec ) use the copy constructor during vector::resize() instead of the default ctor.
    Compilers tend to be pretty good at optimizing even very complex looking code.

    now, replace the vector value type with

    Code:
    struct char_ { char_(){} char_(char_ const& ){} char _; };
    the above, although not technically a valid vector value type, emulates an allocator implementing a do-nothing default-construction( note that vc2010 does not support the new allocator specification ).
    In my system, char measures 6-8 ms whilst char_ measures 0, showing that the optimization effectively takes place.
    interesting approach...

    can you elaborate on the "not technically a valid vectory value type" ?

    I needed to add an assignment operator to the char_ class to make it work, but
    Code:
    void str2vec(std::vector<char_>& v, const std::string& s)
    {
    	v.resize( s.size()+1 );
    	std::copy(s.begin(), s.end(), v.begin());
    	v[s.size()]=0;
    }
    does seem to work.
    The zero-fill is effectively removed.
    the code behaves as expected for all tests.
    The only (big) disadvantage here is that the compiler now fails to optimize the std::copy into a memmove() call, but instead opts for a byte-by-byte copy loop
    Code:
    more:   mov         dl,byte ptr [eax]  
            mov         byte ptr [ecx+eax],dl  
            inc         eax  
            cmp         eax,edi  
            jne         more
    For short strings this would be an advantage, for very long strings, it's a noticable slowdown.

    it also has a rather strange syntax to feed into the legacy API (getting the result as an LPTSTR)
    LPSTR lpsz = &vec[0].ch;
    which isn't as nice as the vec.data() it was before.

    it'll need some more tests, and this of course also changes the resulttant type, so it's just as much a breach of spec as the unique_ptr<TCHAR> approach. Something tells me they'll feel less for this way out.

  13. #13
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,864

    Re: std::string to std::vector<TCHAR> with terminator

    Why not just replace std::copy with memmove()?

    Code:
    void str2vec(std::vector<char>& v, const std::string& s)
    {
    	v.resize( s.size()+1 );
    	//std::copy(s.begin(), s.end(), v.begin());
    	memmove(&v[0], s.c_str(), s.size());
    	v[s.size()]=0;
    }
    As the elements of the vector are contiguous.
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  14. #14
    Join Date
    Oct 2008
    Posts
    1,168

    Re: std::string to std::vector<TCHAR> with terminator

    Quote Originally Posted by OReubens View Post
    can you elaborate on the "not technically a valid vectory value type" ?
    char_ is not copy-constructible anymore; indeed, the vector data may become garbage just after a simple push_back.

    on the contrary, if you have a compliant c++11 compiler a do-nothing default constructor is sufficient because AFAIR the new container specification directly default-constructs elements into raw storage instead of copyng a default constructed instance as before. In other words, a "struct char_{ char_(){}; char _; };" would do the trick. However, the resulting char_ would still inhibit the memmove optimization ( yes, you can write char_() = default; to avoid that, but this makes the zeroing kick in again ).

    For this reason, I suggested to write an allocator instead ( this use case is specifically allowed by the newst standard ), this would solve the zeroing overhead, the memmove issue and the "strange syntax" issue. That said, I suppose only the latest clang and gcc actually support this.

    anyway, did you considered codeplug's reserve+assign+c_str suggestion ? I think you could even spare the reserve call ( being the const char* returned by c_str() random access iterators, the reserve should be done automatically, but I'm not sure though ... )

  15. #15
    Join Date
    Nov 2003
    Posts
    1,825

    Re: std::string to std::vector<TCHAR> with terminator

    >> I think you could even spare the reserve call
    Agreed.

    Code:
        const string s = "123abc";
        vector<TCHAR> v;
    
        //v.reserve(s.length() + 1);
        v.assign(s.c_str(), s.c_str() + s.length() + 1);
    gg

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center