-
[RESOLVED] Need advice - what to use CStringArray or CStringList
I have a program that first counts and remembers file paths in a specified folder in accordance with a certain filter and then processes them. The counting part of the function needs to collect all file names into an array. The processing part needs to see if a certain file name is in that array.
What type of arrays shall I use for that: CStringArray, CStringList or CStringMap? I'm kind of confused over what each of them does (please don't point me to a Microsoft description, I read it about five times and I still don't get it.) I'm basically looking for the fastest implementation of the dynamic array for this purpose.
-
Re: Need advice - what to use CStringArray or CStringList
Are you interested in the fasted insertion or fastest lookup?
-
Re: Need advice - what to use CStringArray or CStringList
I'll technically be using both.
-
Re: Need advice - what to use CStringArray or CStringList
-
Re: Need advice - what to use CStringArray or CStringList
Quote:
Originally Posted by
ahmd
I'll technically be using both.
Yes, of course, but which is going to be more important? Once collected, are you going to do lots of look-ups? I would consider using a map in this case.
-
Re: Need advice - what to use CStringArray or CStringList
Quote:
Originally Posted by
cilu
Yes, of course, but which is going to be more important? Once collected, are you going to do lots of look-ups? I would consider using a map in this case.
I agree. If you need to do a lot of lookups based on a value, a map would be best.
-
Re: Need advice - what to use CStringArray or CStringList
If number of items are only inserted at end, and around 100 elements reside in container, then std::vector is very appropriate. Even with 1000 elements, vector search works faster than maps!
-
Re: Need advice - what to use CStringArray or CStringList
I'm searching recursively thru files in a folder (so I can't tell you how many there are -- it may be anything from 0 to 10,000 and more). At this stage I need to collect all the file paths in an array.
When all files are found, the second round of recursive search opens each file and processes it in a certain way. At this point if a certain tag is found in a file, I need to match that tag with a file name in the array created above. Each file, on average, may contain from 0 to 10 of those tags, but that number can be more.
So what would you use in this case? (I can't go with std::vector since I've never used it before and I need to go with something that I already know.)
-
Re: Need advice - what to use CStringArray or CStringList
Quote:
Originally Posted by
ahmd
When all files are found, the second round of recursive search opens each file and processes it in a certain way. At this point if a certain tag is found in a file, I need to match that tag with a file name in the array created above. Each file, on average, may contain from 0 to 10 of those tags, but that number can be more.
Could you explain this in more detail ?
1) What is recursive about this search ?
2) What do you ultimately want :
a) for each file ... a list of tags in that file ?
b) for each tag ... a list of files that have that tag ?
-
Re: Need advice - what to use CStringArray or CStringList
Quote:
Originally Posted by
ahmd
I'm searching recursively thru files in a folder (so I can't tell you how many there are -- it may be anything from 0 to 10,000 and more). At this stage I need to collect all the file paths in an array.
When all files are found, the second round of recursive search opens each file and processes it in a certain way. At this point if a certain tag is found in a file, I need to match that tag with a file name in the array created above. Each file, on average, may contain from 0 to 10 of those tags, but that number can be more.
So what would you use in this case? (I can't go with std::vector since I've never used it before and I need to go with something that I already know.)
Why a second recursive search? You already have all your file names in your collection object, why not just iterate through that?
-
Re: Need advice - what to use CStringArray or CStringList
Just 2 cents...
I hope ahmd refers to a method that searches for all the files in a folder, gets / stores their paths and if a folder is encountered, it calls itself (recursion). So instead of storing only the paths, you can also store the information regarding the parent and the level (starting from 0 for the starting directory). You could just create a class that holds a files path (CString), parent (CString) and level (int) and then create a vector of that class?
BTW, using a vector is very simple...if you are not comfortable with STL, you are better off using CArray provided by MFC or even better if you use CAtlArray, if you do not need the serialization support provided by MFCs CObject.
Bhushan
-
Re: Need advice - what to use CStringArray or CStringList
As I said, here is a MFC code to do the find:
You can still get the path, parent and level info and store it in a class and create its MFC Array as I said before...
Bhushan
-
Re: Need advice - what to use CStringArray or CStringList
Quote:
Originally Posted by
Philip Nicoletti
1) What is recursive about this search ?
Pretty much what Bhushan showed in his MSDN link -- if I find a folder the function calls itself with it's path.
Quote:
Originally Posted by
Philip Nicoletti
a) for each file ... a list of tags in that file ?
No.
Quote:
Originally Posted by
Philip Nicoletti
b) for each tag ... a list of files that have that tag ?
If I find that the tag matches I'm updating the file that this tag is in and save it back to disc.
Quote:
Originally Posted by
GCDEF
Why a second recursive search? You already have all your file names in your collection object, why not just iterate through that?
Good point. The reason I do this is because it's an option that can be switched off in the program. Without this option being on, I need to count all the files first to display the progress bar during the second pass.
Quote:
Originally Posted by
bhushan1980
So instead of storing only the paths, you can also store the information regarding the parent and the level (starting from 0 for the starting directory). You could just create a class that holds a files path (CString), parent (CString) and level (int) and then create a vector of that class?
I have the code that's already tested and working, rewriting it completely is not what I'm looking for at this point. (Again my goal is to speed it up/optimize it.) But thanks for offering. Truly I didn't think about such approach. Although, wouldn't it be much slower if I propagate the whole class instead of just a CString object?
-
Re: Need advice - what to use CStringArray or CStringList
Quote:
Originally Posted by
ahmd
Good point. The reason I do this is because it's an option that can be switched off in the program. Without this option being on, I need to count all the files first to display the progress bar during the second pass.
Okay, but what I'm saying is you can get all the paths from your container for the second pass. No need to go searching the whole directory structure again. That should be high on your list of things to eliminate if you're concerned about performance.
-
Re: Need advice - what to use CStringArray or CStringList
Quote:
Originally Posted by
GCDEF
Okay, but what I'm saying is you can get all the paths from your container for the second pass. No need to go searching the whole directory structure again. That should be high on your list of things to eliminate if you're concerned about performance.
Well, not really. I actually started from your approach and what I observed was that once the number of files gets over, lets say, a thousand, it takes longer to collect all file paths in a memory array and then use them from there, than simply doing two file search passes. What happens is that first it goes really fast but then it literally slows down to a crawl (probably due to constant memory reallocations of the dynamic array.) The file search, I believe is cached by the OS so the second pass runs very fast.
-
Re: Need advice - what to use CStringArray or CStringList
Quote:
Originally Posted by
ahmd
What happens is that first it goes really fast but then it literally slows down to a crawl (probably due to constant memory reallocations of the dynamic array.) The file search, I believe is cached by the OS so the second pass runs very fast.
CArray can have that problem (altough 1000 seems very small for that
to happen). a std::vector will not run in to that difficulty nearly as much.
To prevent re-allocations, a CList or std:: list or std::deque could be
used.
When I asked : for each tag ... a list of files that have that tag ?
you answered : If I find that the tag matches I'm updating the file that this tag is in and save it back to disc.
So does this mean at some point, you are given a tag ... you need to
find ALL files that have that tag ? If so, then the following should work:
std::map< CString , list<CString> > (or similar MFC)
where the key is the tag , and the value is a list of files
having that tag.
-
Re: Need advice - what to use CStringArray or CStringList
Quote:
Originally Posted by
ahmd
So what would you use in this case? (I can't go with std::vector since I've never used it before and I need to go with something that I already know.)
What if what you already know is less efficient than something you don't know? Are you willing to learn the thing that you don't know?
It has been reported (and I tend to agree), that the STL containers are more efficient and robust than the MFC containers. All it takes is an example or two from a book as to how to use them correctly. Unless you really have a huge stake in the serialization methods used in the MFC container classes, or you're maintaining a legacy MFC only app, IMO there is little, if any reason to be using the MFC container classes. I've seem rumors where the Microsoft engineers do not even use the MFC containers, but use the STL containers, but those are just rumors and I can't really substantiate them (but it wouldn't surprise me).
To be honest, I'm kind of surprised the MFC containers haven't been deprecated -- they have been around since 1990, before the STL containers made their way into the C++ standard library, and IMO been superseded by the standard components. There was nothing wrong with the MFC containers back in the early days of Visual C++ 1.0 or 1.5, but the standard C++ library container classes has surpassed those old MFC container classes (but again, only if you really and truly need the MFC serialization stuff in the MFC container classes, and even then, you can write your own serialization for the STL containers).
Now CString, that is a different story -- the CString class (which is now templated) is a very good implementation that has advantages over std::string.
Regards,
Paul McKenzie
-
Re: Need advice - what to use CStringArray or CStringList
OK, guys, I see your point and I'm ready to do a comparative analysis, but I need your help to substitute my existing methods with the "better ones". So I'm currently using the following technique.
1. I have a global variable for the array that contains all paths:
Code:
CStringArray* pArrConvFiles;
2. During the first pass I collect file paths into that array like this:
Code:
pArrConvFiles->Add(strFilePath);
3. Then when I need to find if a certain file is in the array during the second pass I do the following:
Code:
for(int i = 0; i < pArrConvFiles->GetCount(); i++)
{
if(strFile.CompareNoCase(pArrConvFiles->GetAt(i)) == 0)
{
//Found it
return TRUE;
}
}
return FALSE;
-
Re: Need advice - what to use CStringArray or CStringList
Quote:
Originally Posted by
ahmd
OK, guys, I see your point and I'm ready to do a comparative analysis, but I need your help to substitute my existing methods with the "better ones". So I'm currently using the following technique.
1. I have a global variable for the array that contains all paths:
Code:
CStringArray* pArrConvFiles;
Why does pArrConvFiles need to be a pointer?
Regards,
Paul McKenzie
-
Re: Need advice - what to use CStringArray or CStringList
Quote:
Originally Posted by
Paul McKenzie
Why does pArrConvFiles need to be a pointer?
OK, I'll change it to:
Code:
CStringArray ArrConvFiles;
but it has to be a pointer in #2 & 3. Does it make that much slower if it's a pointer?
-
Re: Need advice - what to use CStringArray or CStringList
Quote:
Originally Posted by
ahmd
3. Then when I need to find if a certain file is in the array during the second pass I do the following:
Code:
for(int i = 0; i < pArrConvFiles->GetCount(); i++)
{
if(strFile.CompareNoCase(pArrConvFiles->GetAt(i)) == 0)
{
//Found it
return TRUE;
}
}
return FALSE;
If you sorted the array of names first, then you could use a binary search instead of looping through each element. Or maybe you should use a different container, one that has fast lookups (map, for example).
Regards,
Paul McKenzie
-
Re: Need advice - what to use CStringArray or CStringList
Quote:
Originally Posted by
Paul McKenzie
If you sorted the array of names first, then you could use a binary search instead of looping through each element. Or maybe you should use a different container, one that has fast lookups (map, for example).
Paul, I really don't know what it means. If you give me a code sample I'll substitute it with what I already have and run the two for the same folder and post here the exact time that it took two methods to complete.
-
Re: Need advice - what to use CStringArray or CStringList
Quote:
Originally Posted by
ahmd
Paul, I really don't know what it means. If you give me a code sample I'll substitute it with what I already have and run the two for the same folder and post here the exact time that it took two methods to complete.
First, the bottleneck I see is the "compareNoCase" function, on top of using a linear data structure such as an array.
Possibly, you should use a map of <CString, CString>, where the key is the lower case version of the name, and the data (the second item in the map) is the original name without any case changes.
Code:
#include <map>
//...
typedef std::map<CString, CString> MapCStringToCString;
//...
MapCStringToCString myData;
//...
CString strFilePath; // your original data you want to "add"
CString strFilePathLower; // lower case version of the above
//...
//.. This is how you add a name to the map
myData.insert( std::make_pair( strFilePathLower, strFilePath) );
//...
Now that the map is built, it is easy (and faster) to do lookups.
Code:
CString strToSearch;
// make sure that strToSearch has been converted to lower case first
//..
MapCStringToCString::iterator itFound = myData.find( strToSearch );
if ( itFound != myMap.end() )
{
// we've found the name.
itFound->first // is the lower case version of the found name
itFound->second // this is the original case version of the name
}
If you have thousands of strings, this should be faster. Sorry, but I didn't use MFC map, only STL map. The possible slow part is the initial building of the map, but that should only be done once. If you're doing thousands of lookups, this is definitely faster. Note that there are absolutely no loops in the code. Since a map uses a tree structure, if you have 1,000 strings, you have a maximum of 10 comparisons. For 100,000 strings, the maximum number of comparisons is 16. Compare that with your loop --what if the item you want to find is at postion 99,999 of an array of 100,000? I think you would want to take a nap before you get a response back from the app if you used the array.
Whether this is the best way, container to use, etc. I don't think so, since I just came up with it out of the top of my head as a quick "solution" to what you posted. Others will mull over the advantages and disadvantages of what I posted.
Regards,
Paul McKenzie
-
Re: Need advice - what to use CStringArray or CStringList
You can also sort your CStringArray and do a binary search directly on it:
Code:
#include <algorithm> // for sort and binary search
struct NoCaseLess
{
bool operator() (const CString & lhs , const CString & rhs)
{
return lhs.CompareNoCase(rhs)==-1;
}
};
// after first pass and you have all the files
const int n = pArrConvFiles->GetSize();
std::sort(&pArrConvFiles->operator[](0),&pArrConvFiles->operator[](n),NoCaseLess());
// and then when you want to search for a file
if (std::binary_search(&pArrConvFiles->operator[](0),&pArrConvFiles->operator[](n),strFile,NoCaseLess()))
{
// found
}
Question: How many times do you search thru the array ?
-
Re: Need advice - what to use CStringArray or CStringList
How can I post a new thread?
-
Re: Need advice - what to use CStringArray or CStringList
Thank you! Now I have two methods to try. Let me go with one at a time and I'll report which method is faster.
Quote:
Originally Posted by
Paul McKenzie
Possibly, you should use a map of <CString, CString>
Paul, since I'm using this on Windows OS (with a possible use of UNC network file names), it won't matter if a path is converted to all-lower letters, will it? If so, it will make it even faster if I eliminate your strFilePath member and keep strFilePathLower only.
Quote:
Originally Posted by
Philip Nicoletti
You can also sort your CStringArray and do a binary search directly on it:
Very good too. Philip, I'll try it as soon as I'm done with Paul's code.
Quote:
Originally Posted by
Philip Nicoletti
Question: How many times do you search thru the array ?
I can't really tell. It depends on the contents of files the function works with. It may be 1 through 10 per file on average.
Quote:
Originally Posted by
autostrad
How can I post a new thread?
Go to this link:
http://www.codeguru.com/forum/forumdisplay.php?f=7
and click "New thread" on the top left side.
-
Re: Need advice - what to use CStringArray or CStringList
OK. A follow-up to what I posted above. I did the test and my results are inconclusive. I tested it on 862 files (not much, but I can't afford more time for it at this moment). During the operation the search was performed 646 times during the second pass. I also tested it with a release build of the project.
My unaltered method:
00:46 sec
Philip Nicoletti's code:
00:46 sec
Paul McKenzie's code:
00:45 sec
-
Re: Need advice - what to use CStringArray or CStringList
Huh! just trying to catch up...
Quote:
I have the code that's already tested and working, rewriting it completely is not what I'm looking for at this point. (Again my goal is to speed it up/optimize it.) But thanks for offering. Truly I didn't think about such approach. Although, wouldn't it be much slower if I propagate the whole class instead of just a CString object?
Well, I understand that and in such a case, what I suggested might be trivial. I just got it at the back of my mind, when I did a code in C++ to store information on a tree structure with parent and child nodes and hence used a collection of a class with Names, Parents and level members...The information was finally stored in SQL server database. I just remember the main skeleton; hardly remember the code.
Regards,
Bhushan
-
Re: Need advice - what to use CStringArray or CStringList
One more interesting result, so I don't really know what to think.
128,079 files, ran only the first pass.
In case of CStringArray ---- 00:40 sec
In case of std::map<CString, CString> ---- 00:45 sec
-
Re: Need advice - what to use CStringArray or CStringList
Quote:
Originally Posted by
ahmd
One more interesting result, so I don't really know what to think.
128,079 files, ran only the first pass.
In case of CStringArray ---- 00:40 sec
In case of std::map<CString, CString> ---- 00:45 sec
We have to see exactly what you timed and your code.
There is no way a loop that calls CompareNoCase on each iteration, and the number of iterations can be over 100,000 is faster than a map or a binary search that makes at a maximum, 15 or 16 comparisons to find an item. Maybe there is something buggy in your application.
If it were the case where an array is as fast as a map for lookups, why use a map at all in any application, and not just use arrays? Of course that doesn't make sense, so we need to see what you've coded.
Regards,
Paul McKenzie
-
Re: Need advice - what to use CStringArray or CStringList
For example, if you just timed this search:
Code:
MapCStringToCString::iterator itFound = myData.find( strToSearch );
if ( itFound != myMap.end() )
{
// we've found the name.
itFound->first // is the lower case version of the found name
itFound->second // this is the original case version of the name
}
Compared with your loop, you will see that this method runs rings around using arrays and looping methods. The only way the map wouldn't be faster is if your item in the array is within the first 15 or so elements in the array, and even then, the array method could be slower.
If the items your finding are in the middle of your array, you do the math. You're calling CompareNoCase() on average 50,000 times to find one item. With the map, you're calling the straight compare (actutally the CString's operator ==) a maximum of 16 times. There really isn't any doubt which would be faster.
Also for Visual C++, you need to set the preprocessor flag to enable full optimizations for STL components. That preprocessor flag is _SECURE_SCL=0 (add this to your preprocessor settings and rebuild your release version of the app). You will probably see a dramatic speed increase in the map method.
Regards,
Paul McKenzie
-
Re: Need advice - what to use CStringArray or CStringList
It is not clear what you are timing ... adding 128,079 elements to
any of the containers mentioned should be very fast ... not in the
40 second range. Are you sure that the container part of your
algorithm is problem ?
Also, using a binary search or a map would only help on the second
search pass of your algorithm ... and only help the search part ... it
you do other processing in the second pass you need to time that
separately.
-
Re: Need advice - what to use CStringArray or CStringList
Quote:
Originally Posted by
ahmd
One more interesting result, so I don't really know what to think.
128,079 files, ran only the first pass.
In case of CStringArray ---- 00:40 sec
In case of std::map<CString, CString> ---- 00:45 sec
If I understand what you have posted so far ... the first pass searches
thru your folders, and adds selected files to a container. If this is
the case, then, as I mentioned in my previous post, adding to the
container is probably not slowing you down on the first pass.
Some example timings:
Code:
A) Adding 131,256 strings of size 200 to
* a vector : 0.156 seconds
* a CStringArray : 0.172 seconds
* a map : 0.985 seconds
* a deque : 0.061 seconds
B) Adding 965,503 strings of size 200 to
* a vector : 0.703 seconds
* a CStringArray : 5.547 seconds
* a map : 9.594 seconds
* a deque : 0.25 seconds
This is about what I would expect. Adding to a vetcor/Cstring Array
is O(n) (amortized). Adding to a map is O(n * LOG(n) )
A deque is also O(n) to add ... but does not do rellocates/re-copies.
Also, as the number of elements gets large, CStringArray does a lot
more reallocations/copies than vector. (I don't know if MS has changed
the re-allocate algorithm in the its later versions of the compiler).
A map is going to be slower at adding ... its use is for fast searching ...
O(LOG(n)) compared to a linear search : O(n)
-
Re: Need advice - what to use CStringArray or CStringList
Guys, you misread my results. You need to look at my first post:
http://www.codeguru.com/forum/showpo...5&postcount=27
that gave the timing for the "real life" (actual) situation. As you can see I have a gain of 1 second with Paul's method.
Since that number of runs (862 files, 646 searches) was way lower than I announced, I decided to try the first pass only for a greater number of files with Paul's method. And that is the results from the second thread:
http://www.codeguru.com/forum/showpo...5&postcount=29
Again, in this case all that was done was a recursive search for all the files in a folder (including subfolders), and the second pass was not performed (due to the reasons of time it would take and due to the fact that I could not perform an algorithm of the second pass on every type of file included in the first pass.)
-
Re: Need advice - what to use CStringArray or CStringList
Quote:
Originally Posted by
ahmd
Guys, you misread my results. You need to look at my first post:
http://www.codeguru.com/forum/showpo...5&postcount=27
that gave the timing for the "real life" (actual) situation. As you can see I have a gain of 1 second with Paul's method.
Since that number of runs (862 files, 646 searches) was way lower than I announced, I decided to try
the first pass only for a greater number of files with Paul's method. And that is the results from the second thread:
http://www.codeguru.com/forum/showpo...5&postcount=29
Again, in this case all that was done was a recursive search for all the files in a folder (including subfolders), and the second pass was not performed (due to the reasons of time it would take and due to the fact that I could not perform an algorithm of the second pass on every type of file included in the first pass.)
I'm not sure what you mean by mis-read your results. Your description above
is exactly what I thought your results represented.
Obviously, with 862 files and 646 searches, the "container-portion" of your
algorithm will takle less than a second (not matter what type of container
you use).
It is only for larger number of elements and searches that the container to use
and the search method will be significant.
Even about 200,000 elements will not take long to add ... but could take a long time
if you do 200,000 linear searches on them.
-
Re: Need advice - what to use CStringArray or CStringList
An example of doing a linear search versus a binary search on sorted data:
20,000 strings in the container ...
20,000 searches (searched for each element in the container):
a) linear search : 45 seconds
b) binary search (0.2 seconds ... including the time to do the sort).
It would get even more dramatic as the number of elements/searches increases.
-
Re: Need advice - what to use CStringArray or CStringList
Quote:
Originally Posted by
ahmd
It seems for Visual C++, the array search is very fast, and will only be noticeable if the number of elements are very large.
However regardless of that, the fact still remains that you will be calling a very fast comparison routine (either CompareNoCase or CString::operator ==) a maximum of 100,000 times using an array of 100,000 names, as opposed to calling CString::operator == (the map::find() calls it) a maximum of 16 times if using a map.
I don't know about you, but I would take 16 times max over 100,000 times max any day.
Regards,
Paul McKenzie
-
Re: Need advice - what to use CStringArray or CStringList
Yes, Paul, I'll definitely use the map::find() search method. One question though, I observed during my tests that when I used maps vs. CStringArray the memory usage spiked at almost 10MB more. Does map array use more memory then a linear CStringArray and should that be my concern for a big number of files on a system with less RAM?
Also, Philip, I agree with you that the operations with the memory container itself won't take up all the time that I reported. Most of that time is used by the file I/O operations, but in this case it doesn't matter since I ran all three tests (with your methods) with the same set of files.
Again, my original issue was that CStringArray was getting progressively slower when I was adding a big number of files into it (on par of hunderds of thousands), but as my last test showed that std::map and Paul's method (excluding the search part) is actually a bit slower than CStringArray when adding new elements. (My only doubt now after Paul's previous post is whether I ran his method in a debug build by accident and that is why it showed such a stark contrast in time.)
-
Re: Need advice - what to use CStringArray or CStringList
Quote:
Originally Posted by
ahmd
Yes, Paul, I'll definitely use the map::find() search method. One question though, I observed during my tests that when I used maps vs. CStringArray the memory usage spiked at almost 10MB more. Does map array use more memory then a linear CStringArray and should that be my concern for a big number of files on a system with less RAM?
Also, Philip, I agree with you that the operations with the memory container itself won't take up all the time that I reported. Most of that time is used by the file I/O operations, but in this case it doesn't matter since I ran all three tests (with your methods) with the same set of files.
Again, my original issue was that CStringArray was getting progressively slower when I was adding a big number of files into it (on par of hunderds of thousands), but as my last test showed that std::map and Paul's method (excluding the search part) is actually a bit slower than CStringArray when adding new elements. (My only doubt now after Paul's previous post is whether I ran his method in a debug build by accident and that is why it showed such a stark contrast in time.)
maybe I am getting a little confused on the tests that you ran.
Correct me if I am wrong:
Test # 1) 862 files ... 646 searches ... ran both passes
All methods were about the same (as expected). The container
operations (adding 862 elements and doing 646 searches on 862
elemnets will take less than a second, even on a slow computer).
Test # 2) Adding 128,079 files to the container ... no searches.
The CStringArray was faster than the map (again as expected) ...
As shown by both your results and the benchmarks that I
provided also).
Adding to a CStringArray is O(n) amoritized. Adding to a map
is O(n*Log(n)), since it sorts the the elements when it adds.
As the number of elements increases, a map might become
faster than a CStringArray at adding elements, because
of CStringArrays non-agressive reallocation algorithm
(at least, up to VC2003)
The reason for using a map or sorted container is to speed up
the searches, not in the insertions.
And yes, CStringArray will get slower when inserting elements
(because of rellocations and recopies). vector will be faster
since it has a more aggressive reallocation scheme. deque
will be faster still as it does not do any reallocations.
(see my previous benchmarks on this).
-
Re: Need advice - what to use CStringArray or CStringList
Quote:
Originally Posted by
ahmd
Yes, Paul, I'll definitely use the map::find() search method. One question though, I observed during my tests that when I used maps vs. CStringArray the memory usage spiked at almost 10MB more.
A map is usually implemented as a tree, so to maintain the tree struture requires more memory for each item added to the tree. The extra elements are the branches of the tree, so that searching and traversing the tree is done in minimal time.
An array is probably the data structure that requires the least memory for storing string data. Any other container, be it a linked list, deque, map, etc. requires more memory. But for that saving in memory, you sacrifice fast(er) lookups, and you get slower insertions/deletions.
Regards,
Paul McKenzie
-
Re: Need advice - what to use CStringArray or CStringList
a map is usually used for "key/value" lookup. You do not have a
value,so a map might not make the most sense (a std::set is
possible).
But even using a std::set, memory will spike. I recommend using
one of the dynamic array containers(CStringArray, vector, deque).
AFTER adding ALL the elements, sort them. Then you can use
a binary search to get fast lookups without using the extra
memory.
Quick test: 200,000 vector , sorted, looking up all 200,000
elements using a binary search : 2.2 seconds (including the
one time sort).
-
Re: Need advice - what to use CStringArray or CStringList
Quote:
Originally Posted by
Philip Nicoletti
maybe I am getting a little confused on the tests that you ran.
Correct me if I am wrong:
Philip, come on, I said this several times already. You're right in your assumptions, except that you don't see that I'm constructing this array of file paths after finding them out with FindFirstFile/FindNextFile API's. Then during the second pass I open and read each file to get to its contents. That is what is most of that time is used up for when the array is small.
So, guys, what is the bottom line here? That the fastest way would be to use CStringArray, comprised of low-case paths, sorted alphabetically, and then use binary search for a look-up. Is that it?
If so, one question. What is more effecient to add elements according to the sort order, or add them as they are and sort the whole array when we're done?
-
Re: Need advice - what to use CStringArray or CStringList
1) if possible, always add all elements before sorting ... O(nLogN)
versus O(n*n). In your case, you can add all elements first.
2) For small number of elements, adding to a CStringArray is fine. As
the number of elements gets larger, adding to the CStringArray gets
more and more inefficient. I suggest running a benchmarks with
the typical values that you expect. I would use a vector or deque
instead. Again, run a few benchmarks to see the difference in
performance.
3) For large number of files with a large number of searches, you
would have to go with a conatiner that can do fast searches.
Have you tried doing 100,000 searches on 100,000 elements using
a linear search versus doing a binary search ?
Quote:
Philip, come on, I said this several times already. You're right in your assumptions, except that you don't see that I'm constructing this array of file paths after finding them out with FindFirstFile/FindNextFile API's. Then during the second pass I open and read each file to get to its contents. That is what is most of that time is used up for when the array is small.
I am not sure why you think I am not seeing that. Every benchmark
timing that I have provided assumed that.
My conclusiion : For a small number of files, the container
used and the serach method will not take much time. It is
only for a large number of files and searches that the container
and search method matters.
-
Re: [RESOLVED] Need advice - what to use CStringArray or CStringList
OK. Last post. Thank you everyone, who contributed, especially Paul and Philip, I appreciate your insight.