CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 3 of 3 FirstFirst 123
Results 31 to 44 of 44
  1. #31
    Join Date
    Apr 1999
    Posts
    27,449

    Re: Need advice - what to use CStringArray or CStringList

    For example, if you just timed this search:
    Code:
    MapCStringToCString::iterator itFound = myData.find( strToSearch );
    if ( itFound != myMap.end() ) 
    {
       // we've found the name.  
       itFound->first // is the lower case version of the found name
       itFound->second //  this is the original case version of the name
    }
    Compared with your loop, you will see that this method runs rings around using arrays and looping methods. The only way the map wouldn't be faster is if your item in the array is within the first 15 or so elements in the array, and even then, the array method could be slower.

    If the items your finding are in the middle of your array, you do the math. You're calling CompareNoCase() on average 50,000 times to find one item. With the map, you're calling the straight compare (actutally the CString's operator ==) a maximum of 16 times. There really isn't any doubt which would be faster.

    Also for Visual C++, you need to set the preprocessor flag to enable full optimizations for STL components. That preprocessor flag is _SECURE_SCL=0 (add this to your preprocessor settings and rebuild your release version of the app). You will probably see a dramatic speed increase in the map method.

    Regards,

    Paul McKenzie
    Last edited by Paul McKenzie; September 25th, 2009 at 04:22 AM.

  2. #32
    Join Date
    Aug 2000
    Location
    West Virginia
    Posts
    7,725

    Re: Need advice - what to use CStringArray or CStringList

    It is not clear what you are timing ... adding 128,079 elements to
    any of the containers mentioned should be very fast ... not in the
    40 second range. Are you sure that the container part of your
    algorithm is problem ?

    Also, using a binary search or a map would only help on the second
    search pass of your algorithm ... and only help the search part ... it
    you do other processing in the second pass you need to time that
    separately.
    Last edited by Philip Nicoletti; September 25th, 2009 at 04:24 AM.

  3. #33
    Join Date
    Aug 2000
    Location
    West Virginia
    Posts
    7,725

    Re: Need advice - what to use CStringArray or CStringList

    Quote Originally Posted by ahmd View Post
    One more interesting result, so I don't really know what to think.

    128,079 files, ran only the first pass.

    In case of CStringArray ---- 00:40 sec
    In case of std::map<CString, CString> ---- 00:45 sec
    If I understand what you have posted so far ... the first pass searches
    thru your folders, and adds selected files to a container. If this is
    the case, then, as I mentioned in my previous post, adding to the
    container is probably not slowing you down on the first pass.


    Some example timings:

    Code:
    A) Adding 131,256 strings of size 200 to
        * a vector : 0.156 seconds
        * a CStringArray : 0.172 seconds
        * a map : 0.985 seconds
        * a deque : 0.061 seconds
    
    B) Adding 965,503 strings of size 200 to
        * a vector : 0.703 seconds
        * a CStringArray : 5.547 seconds
        * a map : 9.594 seconds
        * a deque : 0.25 seconds
    This is about what I would expect. Adding to a vetcor/Cstring Array
    is O(n) (amortized). Adding to a map is O(n * LOG(n) )

    A deque is also O(n) to add ... but does not do rellocates/re-copies.

    Also, as the number of elements gets large, CStringArray does a lot
    more reallocations/copies than vector. (I don't know if MS has changed
    the re-allocate algorithm in the its later versions of the compiler).

    A map is going to be slower at adding ... its use is for fast searching ...
    O(LOG(n)) compared to a linear search : O(n)
    Last edited by Philip Nicoletti; September 25th, 2009 at 07:38 AM. Reason: added std::deque info

  4. #34
    Join Date
    Feb 2009
    Location
    Portland, OR
    Posts
    1,488

    Re: Need advice - what to use CStringArray or CStringList

    Guys, you misread my results. You need to look at my first post:
    http://www.codeguru.com/forum/showpo...5&postcount=27
    that gave the timing for the "real life" (actual) situation. As you can see I have a gain of 1 second with Paul's method.

    Since that number of runs (862 files, 646 searches) was way lower than I announced, I decided to try the first pass only for a greater number of files with Paul's method. And that is the results from the second thread:
    http://www.codeguru.com/forum/showpo...5&postcount=29
    Again, in this case all that was done was a recursive search for all the files in a folder (including subfolders), and the second pass was not performed (due to the reasons of time it would take and due to the fact that I could not perform an algorithm of the second pass on every type of file included in the first pass.)

  5. #35
    Join Date
    Aug 2000
    Location
    West Virginia
    Posts
    7,725

    Re: Need advice - what to use CStringArray or CStringList

    Quote Originally Posted by ahmd View Post
    Guys, you misread my results. You need to look at my first post:
    http://www.codeguru.com/forum/showpo...5&postcount=27
    that gave the timing for the "real life" (actual) situation. As you can see I have a gain of 1 second with Paul's method.

    Since that number of runs (862 files, 646 searches) was way lower than I announced, I decided to try the first pass only for a greater number of files with Paul's method. And that is the results from the second thread:
    http://www.codeguru.com/forum/showpo...5&postcount=29
    Again, in this case all that was done was a recursive search for all the files in a folder (including subfolders), and the second pass was not performed (due to the reasons of time it would take and due to the fact that I could not perform an algorithm of the second pass on every type of file included in the first pass.)
    I'm not sure what you mean by mis-read your results. Your description above
    is exactly what I thought your results represented.

    Obviously, with 862 files and 646 searches, the "container-portion" of your
    algorithm will takle less than a second (not matter what type of container
    you use).

    It is only for larger number of elements and searches that the container to use
    and the search method will be significant.

    Even about 200,000 elements will not take long to add ... but could take a long time
    if you do 200,000 linear searches on them.
    Last edited by Philip Nicoletti; September 25th, 2009 at 12:42 PM.

  6. #36
    Join Date
    Aug 2000
    Location
    West Virginia
    Posts
    7,725

    Re: Need advice - what to use CStringArray or CStringList

    An example of doing a linear search versus a binary search on sorted data:

    20,000 strings in the container ...
    20,000 searches (searched for each element in the container):

    a) linear search : 45 seconds
    b) binary search (0.2 seconds ... including the time to do the sort).

    It would get even more dramatic as the number of elements/searches increases.

  7. #37
    Join Date
    Apr 1999
    Posts
    27,449

    Re: Need advice - what to use CStringArray or CStringList

    Quote Originally Posted by ahmd View Post
    Guys, you misread my results. You need to look at my first post:
    http://www.codeguru.com/forum/showpo...5&postcount=27
    that gave the timing for the "real life" (actual) situation. As you can see I have a gain of 1 second with Paul's method.
    It seems for Visual C++, the array search is very fast, and will only be noticeable if the number of elements are very large.

    However regardless of that, the fact still remains that you will be calling a very fast comparison routine (either CompareNoCase or CString::operator ==) a maximum of 100,000 times using an array of 100,000 names, as opposed to calling CString::operator == (the map::find() calls it) a maximum of 16 times if using a map.

    I don't know about you, but I would take 16 times max over 100,000 times max any day.

    Regards,

    Paul McKenzie

  8. #38
    Join Date
    Feb 2009
    Location
    Portland, OR
    Posts
    1,488

    Re: Need advice - what to use CStringArray or CStringList

    Yes, Paul, I'll definitely use the map::find() search method. One question though, I observed during my tests that when I used maps vs. CStringArray the memory usage spiked at almost 10MB more. Does map array use more memory then a linear CStringArray and should that be my concern for a big number of files on a system with less RAM?

    Also, Philip, I agree with you that the operations with the memory container itself won't take up all the time that I reported. Most of that time is used by the file I/O operations, but in this case it doesn't matter since I ran all three tests (with your methods) with the same set of files.

    Again, my original issue was that CStringArray was getting progressively slower when I was adding a big number of files into it (on par of hunderds of thousands), but as my last test showed that std::map and Paul's method (excluding the search part) is actually a bit slower than CStringArray when adding new elements. (My only doubt now after Paul's previous post is whether I ran his method in a debug build by accident and that is why it showed such a stark contrast in time.)

  9. #39
    Join Date
    Aug 2000
    Location
    West Virginia
    Posts
    7,725

    Re: Need advice - what to use CStringArray or CStringList

    Quote Originally Posted by ahmd View Post
    Yes, Paul, I'll definitely use the map::find() search method. One question though, I observed during my tests that when I used maps vs. CStringArray the memory usage spiked at almost 10MB more. Does map array use more memory then a linear CStringArray and should that be my concern for a big number of files on a system with less RAM?

    Also, Philip, I agree with you that the operations with the memory container itself won't take up all the time that I reported. Most of that time is used by the file I/O operations, but in this case it doesn't matter since I ran all three tests (with your methods) with the same set of files.

    Again, my original issue was that CStringArray was getting progressively slower when I was adding a big number of files into it (on par of hunderds of thousands), but as my last test showed that std::map and Paul's method (excluding the search part) is actually a bit slower than CStringArray when adding new elements. (My only doubt now after Paul's previous post is whether I ran his method in a debug build by accident and that is why it showed such a stark contrast in time.)
    maybe I am getting a little confused on the tests that you ran.

    Correct me if I am wrong:

    Test # 1) 862 files ... 646 searches ... ran both passes

    All methods were about the same (as expected). The container
    operations (adding 862 elements and doing 646 searches on 862
    elemnets will take less than a second, even on a slow computer).

    Test # 2) Adding 128,079 files to the container ... no searches.

    The CStringArray was faster than the map (again as expected) ...
    As shown by both your results and the benchmarks that I
    provided also).

    Adding to a CStringArray is O(n) amoritized. Adding to a map
    is O(n*Log(n)), since it sorts the the elements when it adds.
    As the number of elements increases, a map might become
    faster than a CStringArray at adding elements, because
    of CStringArrays non-agressive reallocation algorithm
    (at least, up to VC2003)

    The reason for using a map or sorted container is to speed up
    the searches, not in the insertions.

    And yes, CStringArray will get slower when inserting elements
    (because of rellocations and recopies). vector will be faster
    since it has a more aggressive reallocation scheme. deque
    will be faster still as it does not do any reallocations.
    (see my previous benchmarks on this).
    Last edited by Philip Nicoletti; September 25th, 2009 at 05:55 PM.

  10. #40
    Join Date
    Apr 1999
    Posts
    27,449

    Re: Need advice - what to use CStringArray or CStringList

    Quote Originally Posted by ahmd View Post
    Yes, Paul, I'll definitely use the map::find() search method. One question though, I observed during my tests that when I used maps vs. CStringArray the memory usage spiked at almost 10MB more.
    A map is usually implemented as a tree, so to maintain the tree struture requires more memory for each item added to the tree. The extra elements are the branches of the tree, so that searching and traversing the tree is done in minimal time.

    An array is probably the data structure that requires the least memory for storing string data. Any other container, be it a linked list, deque, map, etc. requires more memory. But for that saving in memory, you sacrifice fast(er) lookups, and you get slower insertions/deletions.

    Regards,

    Paul McKenzie

  11. #41
    Join Date
    Aug 2000
    Location
    West Virginia
    Posts
    7,725

    Re: Need advice - what to use CStringArray or CStringList

    a map is usually used for "key/value" lookup. You do not have a
    value,so a map might not make the most sense (a std::set is
    possible).

    But even using a std::set, memory will spike. I recommend using
    one of the dynamic array containers(CStringArray, vector, deque).

    AFTER adding ALL the elements, sort them. Then you can use
    a binary search to get fast lookups without using the extra
    memory.

    Quick test: 200,000 vector , sorted, looking up all 200,000
    elements using a binary search : 2.2 seconds (including the
    one time sort).

  12. #42
    Join Date
    Feb 2009
    Location
    Portland, OR
    Posts
    1,488

    Re: Need advice - what to use CStringArray or CStringList

    Quote Originally Posted by Philip Nicoletti View Post
    maybe I am getting a little confused on the tests that you ran.

    Correct me if I am wrong:
    Philip, come on, I said this several times already. You're right in your assumptions, except that you don't see that I'm constructing this array of file paths after finding them out with FindFirstFile/FindNextFile API's. Then during the second pass I open and read each file to get to its contents. That is what is most of that time is used up for when the array is small.


    So, guys, what is the bottom line here? That the fastest way would be to use CStringArray, comprised of low-case paths, sorted alphabetically, and then use binary search for a look-up. Is that it?

    If so, one question. What is more effecient to add elements according to the sort order, or add them as they are and sort the whole array when we're done?

  13. #43
    Join Date
    Aug 2000
    Location
    West Virginia
    Posts
    7,725

    Re: Need advice - what to use CStringArray or CStringList

    1) if possible, always add all elements before sorting ... O(nLogN)
    versus O(n*n). In your case, you can add all elements first.

    2) For small number of elements, adding to a CStringArray is fine. As
    the number of elements gets larger, adding to the CStringArray gets
    more and more inefficient. I suggest running a benchmarks with
    the typical values that you expect. I would use a vector or deque
    instead. Again, run a few benchmarks to see the difference in
    performance.

    3) For large number of files with a large number of searches, you
    would have to go with a conatiner that can do fast searches.

    Have you tried doing 100,000 searches on 100,000 elements using
    a linear search versus doing a binary search ?



    Philip, come on, I said this several times already. You're right in your assumptions, except that you don't see that I'm constructing this array of file paths after finding them out with FindFirstFile/FindNextFile API's. Then during the second pass I open and read each file to get to its contents. That is what is most of that time is used up for when the array is small.
    I am not sure why you think I am not seeing that. Every benchmark
    timing that I have provided assumed that.

    My conclusiion : For a small number of files, the container
    used and the serach method will not take much time. It is
    only for a large number of files and searches that the container
    and search method matters.

  14. #44
    Join Date
    Feb 2009
    Location
    Portland, OR
    Posts
    1,488

    Re: [RESOLVED] Need advice - what to use CStringArray or CStringList

    OK. Last post. Thank you everyone, who contributed, especially Paul and Philip, I appreciate your insight.

Page 3 of 3 FirstFirst 123

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured