-
July 25th, 2011, 07:50 PM
#1
[RESOLVED] Searching through a CSV file
Hi all,
I am not programming in C++ quite often, but for a project I was asked to do something for a certain prject. My problem: search for a specific indices in a CSV file. I came up with the code (below) to search for a specific index in a CSV file. It is working fine, but I was wondering if I could speed up this process (it is quite 'slow' for large files). Have a look at my code and let me know how to improve it. If you know a totally different method that should work much quicker than let me know as well.
Some background on the code:
mIndex is a map<string, vector<double>> and could look something like:
"1", 0
"10", 345.0
"100", 75453.0
"2", 331.9
"20", 8865.6
...
And now I need to locate the above indices (e.g. "1", "10", ...) in a CSV file, and as soon as I found the corresponding line, do so stuff with the data from that line. The CSV file could look like:
1, PWERT12, 345.67, 12
2, YFFFER76, 866.32, 06
3, UMMFR24, 634.98, 02
...
All indices in the CSV file (e.g. 1, 2, 3, ...) are unique and could go up to 10,000. This could make it quite slow.
Thanks,
Barbados.
Code:
// Some code here
// Using BOOST
typedef tokenizer < char_separator<char> > tokenizer;
char_separator<char> sep(",", "", keep_empty_tokens);
vector<string> vec;
string lineHeader, linePol;
// Loop over all items defined in mIndex
map<string, long>::const_iterator iter_ii;
for (iter_ii = mIndex.begin(); iter_ii!=mIndex.end(); iter_ii++) {
// Pick up the index we are after
string indexID = (*iter_ii).first;
// Locate this index in the file we are reading. Always start at the beginning
// of the file because we can't assume the file we are reading is sorted.
in.clear();
in.seekg(0, ios::beg);
// Skip header lines
getline(in,lineHeader);
getline(in,lineHeader);
bool indexFound = false;
// Start reading the lines containing information
while (getline(in,linePol)) {
tokenizer tokens(linePol, sep);
vec.assign(tokens.begin(),tokens.end());
// Extract index number
string readID = vec[0];
// Check if we found the index we are after
if (readID == indexID) {
// Do some stuf
break;
}
}
//Do some more stuff
}
-
July 25th, 2011, 09:35 PM
#2
Re: Searching through a CSV file
You are reading the file over and over again for each index. There is no need to do that.
My suggestion would be:
1) Read the file once, and store it in memory in a map<string, string>, where the key is the index (the first token), and the value is the rest of the line
2) Now go through your mIndex, and for each index, look up the corresponding element of the map you built in step 1, and do stuff with it. Lookup in a map is very fast (O(log(n)).
Old Unix programmers never die, they just mv to /dev/null
-
July 26th, 2011, 10:02 AM
#3
Re: Searching through a CSV file
Couldn’t you just go through your CSV file ONCE, and quickly look up if there is a map entry that needs to be processed?
Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
Convenience and productivity tools for Microsoft Visual Studio:
FeinWindows - replacement windows manager for Visual Studio, and more...
-
July 26th, 2011, 01:49 PM
#4
Re: Searching through a CSV file
Originally Posted by VladimirF
Couldn’t you just go through your CSV file ONCE, and quickly look up if there is a map entry that needs to be processed?
That depends on whether or not the OP needs the entries of mIndex to be processed in the order they appear in mIndex. If not, then yes, this would be even faster.
Old Unix programmers never die, they just mv to /dev/null
-
July 27th, 2011, 04:18 AM
#5
Re: Searching through a CSV file
If the ordering is important then you could try vector<pair<Key,Value>>
"It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong."
Richard P. Feynman
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|