The best way to read 'n' chars from binary file to std::basic_string?
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 9 of 9

Thread: The best way to read 'n' chars from binary file to std::basic_string?

  1. #1
    Join Date
    Oct 2000
    Location
    London, England
    Posts
    4,773

    The best way to read 'n' chars from binary file to std::basic_string?

    I have two possible methods to read a fixed number of characters from a binary stream into a basic_string<unsigned char>

    Which is preferable? And is there a way better than both of these?

    Assume I have a basic_ifstream<unsigned char > ifs in the correct location.

    // method 1
    Code:
    std::vector< unsigned char > vecTemp( n );
    ifs.read( &vecTemp[0], n );
    std::basic_string< unsigned char > bs( vecTemp.begin(), vecTemp.end() );
    // method 2
    Code:
    // implement a copy_n function (or assume we have it)
    template <typename FwdIterator, typename size_type, typename OutputIterator >
    OutputIterator copy_n( FwdIterator in, size_type n, OutputIterator out )
    {
       while ( n-- )
       {
           *out = *in;
          ++in;
          ++out;
       }
       return out;
    }
    
    // now the code
    basic_string< unsigned char > bs;
    bs.reserve( n );
    basic_istreambuf_iterator<unsigned char> it( ifs );
    copy_n( it, n, back_inserter( bs ) );
    Ignore the "effort" involved to write copy_n. The real issues are that the first solution requires me to create a temporary vector (i.e. 2 allocations) simply to allow a big-scale "read" from the file. But the read itself will allow ifstream to optimize rather than pulling off one character at a time. (It's unlikely that physically the file will really be read one character at a time). The latter method does not involve a temporary buffer, but effectively involves looping.

    Of course, there is a 3rd solution, but it's horrible, thus:

    Code:
    basic_string< unsigned char > bs( n ); // create of size n
    char * buf = const_cast< char * >( bs.data() ); // evil!
    ifs.read( buf, n );
    Probably safe but has the evil const_cast.

  2. #2
    Join Date
    May 2000
    Location
    KY, USA
    Posts
    18,652

    Re: The best way to read 'n' chars from binary file to std::basic_string?

    Well...did you do some profiling? I assume that you are looking for the more efficient method since "best" is always subjective...

    Looking at the clarity of the code....the first version would win in my eyes...
    Ciao, Andreas

    "Software is like sex, it's better when it's free." - Linus Torvalds


    Article(s): Allocators (STL) Function Objects (STL)

  3. #3
    Join Date
    Apr 2004
    Location
    Canada
    Posts
    1,342

    Re: The best way to read 'n' chars from binary file to std::basic_string?

    The real issues are that the first solution requires me to create a temporary vector (i.e. 2 allocations) simply to allow a big-scale "read" from the file.
    What about:

    Code:
    std::basic_string< unsigned char > bs;
    ifs.read( &bs[0], n );
    Last edited by HighCommander4; May 19th, 2005 at 04:28 PM.
    Old Unix programmers never die, they just mv to /dev/null

  4. #4
    Join Date
    Apr 1999
    Posts
    27,431

    Re: The best way to read 'n' chars from binary file to std::basic_string?

    Quote Originally Posted by HighCommander4
    What about:

    Code:
    std::basic_string< unsigned char > bs;
    ifs.read( &bs[0], n );
    The underlying buffer is not guaranteed to be contiguous for std::basic_string. The only container that has a guarantee of contiguous memory is std::vector.

    Regards,

    Paul McKenzie

  5. #5
    Join Date
    Oct 2000
    Location
    London, England
    Posts
    4,773

    Re: The best way to read 'n' chars from binary file to std::basic_string?

    furthermore, the result of bs[0] is not guaranteed to be a character, so &bs[0] is not guaranteed to be a pointer to the first character of data.

    bs[0] will often return a class, which has operator= overloaded (often to invoke a copy-on-write if there are multiple references on the underlying data). It also has an operator char_type() overload.

  6. #6
    Join Date
    Apr 2004
    Location
    Canada
    Posts
    1,342

    Re: The best way to read 'n' chars from binary file to std::basic_string?

    Quote Originally Posted by Paul McKenzie
    The underlying buffer is not guaranteed to be contiguous for std::basic_string. The only container that has a guarantee of contiguous memory is std::vector.
    That's interesting.... would it make sense for strings to be stored in contiguous memory by their very nature, for example because they support operations that find the occurrence of a *sequence* of characters? I'd find it very hard to imagine each character to be stored in a linked list... how could functions like find() be implemented?
    Old Unix programmers never die, they just mv to /dev/null

  7. #7
    Join Date
    Apr 2004
    Location
    Canada
    Posts
    1,342

    Re: The best way to read 'n' chars from binary file to std::basic_string?

    Quote Originally Posted by Improving
    consider using an istreambuf_iterator.Something along the lines of....
    Code:
    std::ifstream inp_file("data.dat");
    std::string((std::istreambuf_iterator<char>(inp_file)),std::istreambuf_iterator<char>());
    You can't extract a specific number of characters like that...
    Old Unix programmers never die, they just mv to /dev/null

  8. #8
    Join Date
    Apr 1999
    Posts
    27,431

    Re: The best way to read 'n' chars from binary file to std::basic_string?

    Quote Originally Posted by HighCommander4
    That's interesting.... would it make sense for strings to be stored in contiguous memory by their very nature, for example because they support operations that find the occurrence of a *sequence* of characters? I'd find it very hard to imagine each character to be stored in a linked list... how could functions like find() be implemented?
    Well, the basic_string could be reference-counted. Changing the buffer directly like that will surely cause problems with the reference counting mechanism.

    Believe it or not, a std::string can be implemented like the non-standard std::rope defined by the SGI STL implementation (http://www.sgi.com/tech/stl/Rope.html). A rope is not stored contiguously, but acts just like a std::string (with advantages and drawbacks).

    Regards,

    Paul McKenzie

  9. #9
    Join Date
    Oct 2000
    Location
    London, England
    Posts
    4,773

    Re: The best way to read 'n' chars from binary file to std::basic_string?

    Code:
    std::ifstream inp_file("data.dat");
    std::string((std::istreambuf_iterator<char>(inp_file)),std::istreambuf_iterator<char>());
    Quote Originally Posted by HighCommander4
    You can't extract a specific number of characters like that...
    Not in one statement but method 2 above uses istreambuf_iterator.

    While it's true that bs[0]. bs[1] .. bs[n] might not be in contiguous memory, it is guaranteed that data() produces a contiguous sequence, and as it must be stored as part of the class because its lifetime is the same as that of the class. Some implementations might have a mutable char_type * and create it the first time data() (or c_str()) is called, thus being able to "build" the string up to that point without having to re-allocate continuously (i.e. how "rope" basically works).

    My guess is that for any likely implementation of basic_string, to create a brand new one from nothing of a given size, calling data() on it and then const-casting the pointer would be a safe thing to do. If you fill the buffer first it would be even safer. A string implementation could have some static buffer it points to until you do the first setting, and this buffer would need to be of size >= your length. So you can't guarantee (unless you've set the buffer the normal way) that it will still be your own. But const-casting is always full of risks as you are doing things the implementer didn't intend you to do.

    For now, by the way, I have implemented version 2. If profiling proves this to be slow I will try version 1 and if that is also too slow I might write a custom class instead of using basic_string. As it involves using a lot of buffers of a fixed size (determined at run-time though not compile time), I might also introduce some pooling if necessary. Performance is quite an important issue.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center