CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 1 of 2 12 LastLast
Results 1 to 15 of 18

Thread: <vector> question

  1. #1
    Join Date
    Jul 2008
    Posts
    3

    <vector> question

    I am working with very large files and I am storing a lot of data in a vector. I would like to make this process of copying the data as fast as possible. Since I'm using the STL I don't have to worry about programming the routines to be as a fast and efficient as possible. But I do have a choice:

    1. Use vector_name.reserve(length) and then a series of push_backs to fill in the data

    --or--

    2. Use assign(array_begin_address,array_end_address)--without reserve, since assign would change the point anyway.

    I am lucky enough to have been furnished with a very fast computer and have tried both ways and can't tell the difference--time-wise. However, the users of the my software might not be so lucky. If anyone could answer this I'd be very grateful.

    -DM

  2. #2
    Join Date
    Mar 2002
    Location
    St. Petersburg, Florida, USA
    Posts
    12,116

    Re: <vector> question

    Remember the first rule of performance...

    Performance is ONLY a priority when you have measured that an implementation has a significant impact on meeting the documented performance requirements. In all other cases, readability and maintainability over-ride performance.

    That being said, look at the logic of your code. If you are "logically" appending to the array, then push_back(...) is the clearest explaination of your intent. The addition of the reserve(...) keeps this intent visible, but provides the performance boost.

    On the ofther hand, assign(....) indicates that you are putting someting into the vector at a specific place. A new reader of your code would have to examine the logic of your parameters to finaly realzie that this was appending to the end of the vector. Additionally a bug could easily cause you to perform some other action than actually appending (consider if the vector length changes somehow between the point where you calculate the offsets and where the assign takes place....

    Looked at that way, the choice (at least to me( is clear!
    TheCPUWizard is a registered trademark, all rights reserved. (If this post was helpful, please RATE it!)
    2008, 2009,2010
    In theory, there is no difference between theory and practice; in practice there is.

    * Join the fight, refuse to respond to posts that contain code outside of [code] ... [/code] tags. See here for instructions
    * How NOT to post a question here
    * Of course you read this carefully before you posted
    * Need homework help? Read this first

  3. #3
    Join Date
    Aug 2000
    Location
    West Virginia
    Posts
    7,716

    Re: <vector> question

    I prefer assign. If you have Scott Meyer's "Effective STL" see
    item 5 : "Prefer range member functions to their single element
    counterparts". The very first example is an "assign versus push_back".


    Quote Originally Posted by TheCPUWizard
    On the ofther hand, assign(....) indicates that you are putting someting into the vector at a specific place.
    Actually, assign replaces the contents of the container.

    So it is the equivalent of a : clear() , reserve() , multiple push_backs().

    You were probably think of insert(). (which Meyer's also prefers).

  4. #4
    Join Date
    Jul 2008
    Location
    dalian, China
    Posts
    36

    Re: <vector> question

    What is the size of your "very large file"?Since this file is very large,why read the file to the memory?
    Cigagou,Cogitou!

  5. #5
    Join Date
    Jul 2008
    Posts
    3

    Re: <vector> question

    First, thanks for your responses.

    My large files are hundreds of MB's to GB's in size. I'm working with CAD and Discrete Data files and these vectors that I'm creating are a know size. The point of creating these vectors is to repair the data or CAD representation of these files by closing gaps and repairing overlaps.

    The implementation of the array's is fixed. I am reading the information in from a file or from a user via an API. I can't touch the original arrays because the original integrity of the data cannot be compromised. Because I can't touch the original array's I'm putting the data in a data structure in a convenient wrapper, stl::vector, so that I can work with the data.

    The question was posed to find the fastest and cheapest way to copy the data into a vector by using either assign or reserve+push_back. A pointer to a vector is a member of the class "Mesh" and these pointers are pointed to "newed" vectors during initialization of an instance of the Mesh class. As I understand it, the number of operations is the same since the data is copied in both cases and the number of assignments is the number of elements in the array long.

    Assign destroys the data the old vector holds and allocates memory for the new array based on the pointers passed to "assign". Then it copies the data.

    Reserve first allocates the array and the push_back copies the array contents into the vector. What I was wondering is if the overhead from so many push_back's out weigh a single call to assign. I know the same thing happens using both. What I was wondering was which one was faster?

  6. #6
    Join Date
    Jul 2008
    Location
    dalian, China
    Posts
    36

    Re: <vector> question

    If all your data had been stored in the array, at the same time, the data is needn't nodified, the assignment is a better choice!
    Else, You will choose the reserve.
    Cigagou,Cogitou!

  7. #7
    Join Date
    Jan 2006
    Location
    Singapore
    Posts
    6,768

    Re: <vector> question

    If you have Scott Meyer's "Effective STL" see
    item 5 : "Prefer range member functions to their single element
    counterparts". The very first example is an "assign versus push_back".
    I do not have a copy of that book with me now... so what is Meyers' reasoning considering that reserve() can be used?
    C + C++ Compiler: MinGW port of GCC
    Build + Version Control System: SCons + Bazaar

    Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
    Kindly rate my posts if you found them useful

  8. #8
    Join Date
    Jul 2008
    Location
    dalian, China
    Posts
    36

    Re: <vector> question

    According to my experience,the two methods didn't have evident effect!
    You need decompound the very large data till the data can push_back to the vector under your anticipant efficiency.
    Cigagou,Cogitou!

  9. #9
    Join Date
    Jan 2006
    Location
    Singapore
    Posts
    6,768

    Re: <vector> question

    Meyers' may have more than just measurable efficiency in mind when he recommends assign over push_back.
    C + C++ Compiler: MinGW port of GCC
    Build + Version Control System: SCons + Bazaar

    Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
    Kindly rate my posts if you found them useful

  10. #10
    Join Date
    Jul 2008
    Location
    dalian, China
    Posts
    36

    Re: <vector> question

    Using the assignment can be more effective than inserting the element one by one !But that depend on all your data must be stored in the sequence container , like the vector,list,array...

    At the same time,while you insert the element ,you don't want to modify some element!
    Cigagou,Cogitou!

  11. #11
    Join Date
    Jul 2008
    Location
    dalian, China
    Posts
    36

    Re: <vector> question

    This is the real meaning of Mayers<<Effective STL>> Item 5!
    Cigagou,Cogitou!

  12. #12
    Join Date
    Apr 2004
    Location
    England, Europe
    Posts
    2,492

    Re: <vector> question

    Sccrman06, if you can't load the data straight into the vector, I would use assign because that function does exactly what you are trying to do (make a vector which contains the given data) in one go.
    Last edited by Zaccheus; July 10th, 2008 at 04:39 AM.
    My hobby projects:
    www.rclsoftware.org.uk

  13. #13
    Join Date
    Mar 2002
    Location
    St. Petersburg, Florida, USA
    Posts
    12,116

    Re: <vector> question

    Quote Originally Posted by laserlight
    Meyers' may have more than just measurable efficiency in mind when he recommends assign over push_back.
    EXACTLY!

    The question of which is faster should be considered only after measurements are taken which determine if the time is meaningful.

    Also finding the time in the general case is very very difficult. It may vary dpending on version/implementation of STL (different results for different compilers). It may also depend on locality of data (impacting L1/L2 caching, etc). It may also depend on the amunt of data.

    Since all of these variables exist, one can not come up with a definitive answer (If anyone is willing to state that one is faster than the other for all possible conditions and put money on the table, let me know!) the decision should revet back to what is cleaner code.
    TheCPUWizard is a registered trademark, all rights reserved. (If this post was helpful, please RATE it!)
    2008, 2009,2010
    In theory, there is no difference between theory and practice; in practice there is.

    * Join the fight, refuse to respond to posts that contain code outside of [code] ... [/code] tags. See here for instructions
    * How NOT to post a question here
    * Of course you read this carefully before you posted
    * Need homework help? Read this first

  14. #14
    Join Date
    Jul 2008
    Posts
    3

    Re: <vector> question

    Thanks for all of the advice guys.

    I think I'll be going with assign since it makes for cleaner code. Assign requires one statement while reserve+push_back requires two+for-loop. I still would like to know which is faster in general. By that I mean which requires fewer operations or whatever it means to be "faster."

  15. #15
    Join Date
    Mar 2002
    Location
    St. Petersburg, Florida, USA
    Posts
    12,116

    Re: <vector> question

    Quote Originally Posted by Sccrman06
    . I still would like to know which is faster in general. By that I mean which requires fewer operations or whatever it means to be "faster."
    Re-read reply #13..... It is impossible to make a 100% accurate prediciton of which approach will be faster. I can generate conditions in a program which will make either one faster than the other for a VERY specific use case.

    Going by "cleaner code" (which is what you are doing) is the only rational approach.
    TheCPUWizard is a registered trademark, all rights reserved. (If this post was helpful, please RATE it!)
    2008, 2009,2010
    In theory, there is no difference between theory and practice; in practice there is.

    * Join the fight, refuse to respond to posts that contain code outside of [code] ... [/code] tags. See here for instructions
    * How NOT to post a question here
    * Of course you read this carefully before you posted
    * Need homework help? Read this first

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width




On-Demand Webinars (sponsored)