CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 1 of 2 12 LastLast
Results 1 to 15 of 24
  1. #1
    Join Date
    May 2009
    Location
    Boston
    Posts
    375

    how does getline() know what line it's getting???

    May be a silly question, but I often use a combination of getline() and fstream in a while loop to read a delimited text file.

    Code:
       ifstream read_file;
       stringstream file_data_stream;
       string new_line, new_cell;
    
       // open the data file into file stream
       read_file.open( input_file.c_str() );
    
      // read in each line of the index file
       while(getline(read_file, new_line)) {
    
          // add current row to stringstream
          file_data_stream << new_line;
    
          // parse stringstream on tab to get fields
          while(getline(file_data_stream, new_cell,'\t')) {
             // the data is now parsed
          }
       }
    This will read through the input_file and parse it into "cells" by line and tab. There doesn't seem to be an iterator to allow getline() to keep track of where it is in the file.

    I need to read in two files so that I am reading the same line of each file one at a time. In other words, read in the first line of file 1 and then the first line of file 2. I can't see how to do this with the structure above.

    If I did something like,

    Code:
       ifstream read_file1, read_file2;
       stringstream file_data_stream1, file_data_stream2;
       string new_line1, new_cell1, new_line2, new_cell2;
    
       // open both files into file streams
       read_file1.open( input_file1.c_str() );
       read_file2.open( input_file2.c_str() );
    
      // read in each line of the first input file
       while(getline(read_file1, new_line1)) {
          // also read the second input file
          getline(read_file2, new_line2);
    
          // add current row to stringstream
          file_data_stream1 << new_line1;
          file_data_stream2 << new_line2;
    
          // parse stringstream for first file on tab to get fields
          while(getline(file_data_stream1, new_cell1,'\t')) {
             // also parse stringstream for second file
              getline(file_data_stream2, new_cell2,'\t');
    
             // the data for the same line of both files is now parsed
    
          }
       }
    Would that read both files in registration? Does getline() have an iterator somewhere I could access to instruct it to read a specific line? Am I going about this in the wrong way altogher?

    I could always just read in one file and store it, but that doesn't seem very efficient in this case.

    LMHmedchem
    Last edited by LMHmedchem; May 13th, 2012 at 09:22 PM.

  2. #2
    Join Date
    Jan 2006
    Location
    Singapore
    Posts
    6,765

    Re: how does getline() know what line it's getting???

    Perhaps you want:
    Code:
    while(getline(read_file1, new_line1) && getline(read_file2, new_line2))
    C + C++ Compiler: MinGW port of GCC
    Build + Version Control System: SCons + Bazaar

    Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
    Kindly rate my posts if you found them useful

  3. #3
    Join Date
    May 2009
    Location
    Boston
    Posts
    375

    Re: how does getline() know what line it's getting???

    Quote Originally Posted by laserlight View Post
    Perhaps you want:
    Code:
    while(getline(read_file1, new_line1) && getline(read_file2, new_line2))
    What would the behavior be here if the two files don't have the same number of lines?

    I was able to get this working with something like,
    Code:
      // read in each line of the first input file
       while(getline(read_file1, new_line1)) {
    
          // also read the second input file
          getline(read_file2, new_line2);
    
          // add current rows to stringstreams
          file_data_stream1 << new_line1;
          file_data_stream2 << new_line2;
    
          // parse stringstream for first file on tab to get fields
          while(getline(file_data_stream1, new_cell1,'\t')) {
              file1_data.push_back(new_cell1);
          }
    
          // parse stringstream for second file on tab to get fields
          while(getline(file_data_stream2, new_cell2,'\t')) {
              file2_data.push_back(new_cell2);
          }
    
       }
    I had to tab parse the new_lines in separate while loops since the number of columns may not be the same in both files.

    I need to create an exception for there being a different number of rows in the two files. My code above will stop when there are no more lines in the first file. If there are fewer lines in the second file than the first, I have a way to determine that. At the moment, I have no way of knowing if there are more lines in the second file.

    I could always open both files and count the lines before I start processing, but that seems inefficient. I can post the whole program if anyone is interested.

    LMHmedchem

  4. #4
    Join Date
    Jan 2006
    Location
    Singapore
    Posts
    6,765

    Re: how does getline() know what line it's getting???

    Quote Originally Posted by LMHmedchem
    What would the behavior be here if the two files don't have the same number of lines?
    It is an &&, which means that both conditions must be satisfied for the loop to keep running. Therefore, it will only loop for as many lines as there are in the file with fewer lines.
    C + C++ Compiler: MinGW port of GCC
    Build + Version Control System: SCons + Bazaar

    Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
    Kindly rate my posts if you found them useful

  5. #5
    Join Date
    May 2009
    Location
    Boston
    Posts
    375

    Re: how does getline() know what line it's getting???

    Quote Originally Posted by laserlight View Post
    It is an &&, which means that both conditions must be satisfied for the loop to keep running. Therefore, it will only loop for as many lines as there are in the file with fewer lines.
    I added this code to count the lines in both files and then compare them.

    Code:
       // count the number of rows in the index file
       index_line_size = count(istreambuf_iterator<char>(read_file1), istreambuf_iterator<char>(), '\n');
    
       // count the number of rows in the index file
       merge_line_size = count(istreambuf_iterator<char>(read_file2), istreambuf_iterator<char>(), '\n');
    
       // make sure that both file have the same number of lines
       if(index_line_size != merge_line_size) {
          cerr << "the index file " << index_data_file << " has " << index_line_size << " lines and" <<endl;
          cerr << "the merge file " << merge_data_file << " has " << merge_line_size << " lines" <<endl;
          cerr << "both files must have the same number of rows" <<endl;
          exit(-3);
       }
    The major annoyance with doing this is that I seem to have to close the files, clear the stringstream, and then open the files again to read and parse them. I guess that's not a big deal, but I can't see any way to count the lines of input and then go back to start reading the first line. I could read the files in separate loops and just store them. Then I could check the size of the containers and process if they match. I don't know if that would be better or not.

    On a pair of files with ~50,000 lines and 30 columns, this runs in ~14s. There is an extra 1s from counting both files.

    LMHmedchem

  6. #6
    Join Date
    Jan 2006
    Location
    Singapore
    Posts
    6,765

    Re: how does getline() know what line it's getting???

    Well, what do you want to do if the number of lines in the files don't match?

    With my suggestion, you can use the eof() member function to check if EOF has been reached after the loop ends, then continue looping over the file that still has lines left to read.
    C + C++ Compiler: MinGW port of GCC
    Build + Version Control System: SCons + Bazaar

    Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
    Kindly rate my posts if you found them useful

  7. #7
    Join Date
    Jul 2005
    Location
    Netherlands
    Posts
    2,042

    Re: how does getline() know what line it's getting???

    Quote Originally Posted by LMHmedchem View Post
    I added this code to count the lines in both files and then compare them.
    There is no need to parse the files twice if you just want to check that they have the same number of lines. laserlight's suggestion can be easily altered to do that.
    Code:
    bool res1, res2;
    while((res1 = getline(read_file1, new_line1)) &&
          (res2 = getline(read_file2, new_line2)))
    {
        // ...
    }
    if (res1 || res2)
    {
        // number of lines does not match
    }
    It's easy to add some code to count the number of lines if you need that.

    The only reason I can think of to check if the files have the same number of lines first is if you want to provide an error message as soon as possible in case of error.
    Cheers, D Drmmr

    Please put [code][/code] tags around your code to preserve indentation and make it more readable.

    As long as man ascribes to himself what is merely a posibility, he will not work for the attainment of it. - P. D. Ouspensky

  8. #8
    Join Date
    May 2009
    Location
    Boston
    Posts
    375

    Re: how does getline() know what line it's getting???

    Quote Originally Posted by laserlight View Post
    Well, what do you want to do if the number of lines in the files don't match?
    Quote Originally Posted by D_Drmmr View Post
    The only reason I can think of to check if the files have the same number of lines first is if you want to provide an error message as soon as possible in case of error.
    This is correct, there is an error if the number of lines don't match and the program exits.

    As typically seems to happen, I started by asking one question and moved to another without adding the relevant information. I use this boilerplate code for allot of text file utilities. This particular one merges two delimited text files on a common index. The files should have the same number of rows with the index in the same order. I can sort with another tool before the merge if that is necessary. It is very important to verify that the data from the two files remains in registration and that the tool was passed the correct pair of files. This is mainly checking the line count and then matching key values when the output is written.

    The benefit I can see to pre-counting the lines in the files is that you know right away if there is a miss match. If I check EOF or when getline() returns false, I won't know there is an issue until one of the files is fully read (or until there is a key mis-match since I check that line by line).

    Is there a way to count the lines without having to open and close the files, and then open them again to process and parse. I am not parsing the files twice, but I do seem to have to open then twice.

    LMHmedchem

  9. #9
    Join Date
    Jan 2006
    Location
    Singapore
    Posts
    6,765

    Re: how does getline() know what line it's getting???

    If you really want to perform the check before processing, then I think you mainly have to choose between reading the file twice and reading once but saving the lines read. Actually, if the files are expected to normally contain correct input, why not just process and detect the error, and upon error detection, ditch what has been processed?
    C + C++ Compiler: MinGW port of GCC
    Build + Version Control System: SCons + Bazaar

    Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
    Kindly rate my posts if you found them useful

  10. #10
    Join Date
    May 2009
    Location
    Boston
    Posts
    375

    Re: how does getline() know what line it's getting???

    Quote Originally Posted by laserlight View Post
    If you really want to perform the check before processing, then I think you mainly have to choose between reading the file twice and reading once but saving the lines read. Actually, if the files are expected to normally contain correct input, why not just process and detect the error, and upon error detection, ditch what has been processed?
    This is probably the best approach since most of the time, there will not be a problem. The most likely reasons for there being an issue would be if I entered the wrong files in the arguments, or because one of the files was not sorted in the same way at the other. Both of those cases would likely fail quickly because of a mismatched key values and the second problem would not be revealed by line counting. I don't see any likely situation where the error wouldn't be detected until near the end. If it happens, oh well.

    I have test processed some files where the first file is 1.5M and the second is 6.1M and both files have 42,586 rows. It takes ~14s to process these files. That seems a bit on the slow side for a compiled app. Do you see anything here that will be especially slow? I can post the entire code and some test files if anyone wants to have a look. It's about 300 lines.

    LMHmedchem

  11. #11
    Join Date
    Jan 2006
    Location
    Singapore
    Posts
    6,765

    Re: how does getline() know what line it's getting???

    You could profile your code to find out where exactly is the bottleneck.
    C + C++ Compiler: MinGW port of GCC
    Build + Version Control System: SCons + Bazaar

    Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
    Kindly rate my posts if you found them useful

  12. #12
    Join Date
    May 2009
    Location
    Boston
    Posts
    375

    Re: how does getline() know what line it's getting???

    Is that just the -p flag with g++,

    g++ -p -o myApp myApp.cpp

    I don't use gdb because I have had trouble getting it to work with my fortran code.

    LMHmedchem

  13. #13
    Lindley is offline Elite Member Power Poster
    Join Date
    Oct 2007
    Location
    Seattle, WA
    Posts
    10,895

    Re: how does getline() know what line it's getting???

    I think it's -pg, actually.

  14. #14
    Join Date
    May 2009
    Location
    Boston
    Posts
    375

    Re: how does getline() know what line it's getting???

    I am having problems getting the logic to evaluate the way I expect.

    I have the while loop as,'

    Code:
       bool have_line1;  bool have_line2;
    
       // get each line from both file in sequence, record if a line was recieved in bool
       while( (have_line1 = getline(read_file1, new_line1)) &&
              (have_line2 = getline(read_file2, new_line2)) ) {
    
          //...
       }
    It seems as if the bool values will always be true as long as the while evaluates as true, so I put the check code after the while loop.

    Code:
       // if one files runs out of lines before the other, this should be triggered
       if(have_line1 != have_line2){
          if(have_line1 == false) {
             cerr << "the index file had fewer lines then the merge file" <<endl;
             cerr << "processing did not complete normally, check output" <<endl;
             exit(-3);
          }
          else if(have_line2 == false) {
             cerr << "the merge file had fewer lines then the index file" <<endl;
             cerr << "processing did not complete normally, check output" <<endl;
             exit(-3);
          }
       }
    This error is always triggered, even when processing completes normally and the printout shows that have_line1= 0 and have_line2= 1. Both files have the same number of lines, so both bool values should be 1 until the file ends, and then both 0.


    I'm not sure about the logic posted by D_Drmmr
    Code:
    if (res1 || res2)
    {
        // number of lines does not match
    }
    That reads to me, if res1 or res2, and both of these should be true until the file finished. I almost never use Boolean logic, so I may be misunderstanding. It seems it should be,

    Code:
    if (!res1 || !res2)
    {
        // number of lines does not match
    }
    which would be if either is false (I think), but both should be false when the while has finished working through the file. It seems as if you are looking for the condition where one is false and one it true, meaning that control dropped out of the while when it was still getting lines from one of the files but not the other.

    Am I missing the point here? I have attached my src and test files if anyone is interested.

    LMHmedchem
    Attached Files Attached Files

  15. #15
    Join Date
    Jul 2005
    Location
    Netherlands
    Posts
    2,042

    Re: how does getline() know what line it's getting???

    Quote Originally Posted by LMHmedchem View Post
    Am I missing the point here? I have attached my src and test files if anyone is interested.
    When you start comparing boolean values with false, it's either time to get some sleep or you've missed the point.

    The loop runs as long as both bools are true. That means that after the loop, at least one of the bools is false. If both are false, the two files have the same number of lines. So only if exactly one of the two is true, you have an error. Now you pick which conditional expression matches that situation.
    Cheers, D Drmmr

    Please put [code][/code] tags around your code to preserve indentation and make it more readable.

    As long as man ascribes to himself what is merely a posibility, he will not work for the attainment of it. - P. D. Ouspensky

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured