CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 5 of 5
  1. #1
    Join Date
    Mar 2007
    Posts
    20

    [RESOLVED] Searching through a CSV file

    Hi all,

    I am not programming in C++ quite often, but for a project I was asked to do something for a certain prject. My problem: search for a specific indices in a CSV file. I came up with the code (below) to search for a specific index in a CSV file. It is working fine, but I was wondering if I could speed up this process (it is quite 'slow' for large files). Have a look at my code and let me know how to improve it. If you know a totally different method that should work much quicker than let me know as well.

    Some background on the code:
    mIndex is a map<string, vector<double>> and could look something like:

    "1", 0
    "10", 345.0
    "100", 75453.0
    "2", 331.9
    "20", 8865.6
    ...

    And now I need to locate the above indices (e.g. "1", "10", ...) in a CSV file, and as soon as I found the corresponding line, do so stuff with the data from that line. The CSV file could look like:

    1, PWERT12, 345.67, 12
    2, YFFFER76, 866.32, 06
    3, UMMFR24, 634.98, 02
    ...

    All indices in the CSV file (e.g. 1, 2, 3, ...) are unique and could go up to 10,000. This could make it quite slow.

    Thanks,
    Barbados.

    Code:
    // Some code here
    
    // Using BOOST
    typedef tokenizer < char_separator<char> > tokenizer;
    char_separator<char> sep(",", "", keep_empty_tokens);
    
    vector<string> vec;
    string lineHeader, linePol;
    
    // Loop over all items defined in mIndex
    map<string, long>::const_iterator iter_ii;
    
    for (iter_ii = mIndex.begin(); iter_ii!=mIndex.end(); iter_ii++) {
    
    	// Pick up the index we are after
    	string indexID = (*iter_ii).first;
    
    	// Locate this index in the file we are reading. Always start at the beginning
    	// of the file because we can't assume the file we are reading is sorted.
    	in.clear();
    	in.seekg(0, ios::beg);
    
    	// Skip header lines
    	getline(in,lineHeader);
    	getline(in,lineHeader);
    
    	bool indexFound = false;
    
    	// Start reading the lines containing information
    	while (getline(in,linePol)) {
    		tokenizer tokens(linePol, sep);
    		vec.assign(tokens.begin(),tokens.end());
    
    		// Extract index number
    		string readID = vec[0];
    		
    		// Check if we found the index we are after 
    		if (readID == indexID) {
    			// Do some stuf
    			break;
    		}
    	}
    
    	//Do some more stuff
    }

  2. #2
    Join Date
    Apr 2004
    Location
    Canada
    Posts
    1,342

    Re: Searching through a CSV file

    You are reading the file over and over again for each index. There is no need to do that.

    My suggestion would be:
    1) Read the file once, and store it in memory in a map<string, string>, where the key is the index (the first token), and the value is the rest of the line
    2) Now go through your mIndex, and for each index, look up the corresponding element of the map you built in step 1, and do stuff with it. Lookup in a map is very fast (O(log(n)).
    Old Unix programmers never die, they just mv to /dev/null

  3. #3
    Join Date
    Aug 2000
    Location
    New York, NY, USA
    Posts
    5,656

    Re: Searching through a CSV file

    Couldn’t you just go through your CSV file ONCE, and quickly look up if there is a map entry that needs to be processed?
    Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
    Convenience and productivity tools for Microsoft Visual Studio:
    FeinWindows - replacement windows manager for Visual Studio, and more...

  4. #4
    Join Date
    Apr 2004
    Location
    Canada
    Posts
    1,342

    Re: Searching through a CSV file

    Quote Originally Posted by VladimirF View Post
    Couldn’t you just go through your CSV file ONCE, and quickly look up if there is a map entry that needs to be processed?
    That depends on whether or not the OP needs the entries of mIndex to be processed in the order they appear in mIndex. If not, then yes, this would be even faster.
    Old Unix programmers never die, they just mv to /dev/null

  5. #5
    Join Date
    Jul 2002
    Location
    Portsmouth. United Kingdom
    Posts
    2,727

    Re: Searching through a CSV file

    If the ordering is important then you could try vector<pair<Key,Value>>
    "It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong."
    Richard P. Feynman

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured