CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 3 of 3
  1. #1
    Join Date
    May 2009
    Location
    Boston
    Posts
    375

    more questions about getline() getting a specific line

    I am reading in some large files to process. The files need to be parsed into multi-line sub units for processing (by a different physical process). My current setup is crude and parses the entire file into memory before beginning to process the sub units. This is fine as long as the file fits, so I get into trouble > 2GB or so when my machine flat runs out of memory. The simple thing to do is to only read in some of the file, process what was read in, and then read in more. I more or less know how to stop reading at some point in the file, but I'm not sure how to resume from that point later when I need more data. Is there a way to count lines and then getline() starting from a specific point in the file? From reading the getline() doc, it doesn't look like there is.

    This is my current function,
    Code:
    void ParseFile(char *path, char type) {
    
       // open input file
       ifstream input_file(path);
    
       if( !input_file.good() ) { cerr << "Failed to open " << path << endl; exit(ERRCODE_ERROR); }
    
       int pos = 1;
    
       // use ifstream.good() to check the io status of the input file, process if it stays up
       while( input_file.good() ) {
    
          // create object for input file data
          Task task;
          // load some data
          task.data = new string;
          task.data->reserve(1024); //Typical size
          task.failed = false; //Initially not failed
          task.pos = pos++;
          task.data->push_back(type);
          task.data->push_back(ineutralize_ctl);
    
          //Read file and add each line to task until end of sub task is reached
          while( true ) {
    #       define BUF_SIZE 1024
             char line[BUF_SIZE];
    
             // read next line from input file
             input_file.getline(line, BUF_SIZE);
    
             // stop and delete the current task if there is a problem with the ifstream
             if( !input_file.good() ) { delete task.data;  break; }
    
             // if the count of characters is too large, exit and notify
             if( input_file.gcount() >= BUF_SIZE-1 ) {
                cerr << "Buffer too small, increase BUF_SIZE" << endl;
                exit(ERRCODE_ERROR);
             }
    
             // add line to task.data, also add EOL (striped by getline)
             task.data->append(line).append("\n");
    
             // Subtask ends when $$$$ is reached
             if( 'S' == type || 'F' == type && 0 == strcmp("$$$$", line) ) {
                // add current task object to tasks list
                tasks.push_back(task);
                break;
             }
    
    // ??? call out from here to process and return to here when list runs out //
    
             if( !input_file.good() ) { break; }
    
          } //while( true )
       } //while( !input_file.good() )
    
       if( !input_file.eof() ) {
          cerr << "Failed to read file (probably out-of-memory)" << endl;
          exit(ERRCODE_ERROR);
       }
    
       // close the input file
       input_file.close();
    
    return;
    } // EOF endbrace
    I need to remove the code to open the file from this function and open it elsewhere.

    I would like to call this every time I run out of data, if( tasks.empty() ), and get more data from the input file, but I don't know how to resume reading the input file from where I left off.

    If this is not possible, I guess I would have to place a function call in while( input_file.good() ) to call out and process the data I have. When the current tasks list is finished, the list could be cleared and control returned into the while loop to continue reading input and start re-populating the list.

    Am I making any sense here? If there is some standard way to implement this kind of think, I would appreciate the info. I'm rather sure this is not the first code to have to process more data than will fit into memory.

    Please let me know if I should post more information.

    LMHmedchem
    Last edited by LMHmedchem; July 5th, 2012 at 06:25 PM.

  2. #2
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,675

    Re: more questions about getline() getting a specific line

    Simply open and store the ifstream object in the calling function and pass a reference to that instead of the path (and don't close it in ParseFile()). That way, whenever you get back to the file object later, it will automatically resume reading where you left it.
    I was thrown out of college for cheating on the metaphysics exam; I looked into the soul of the boy sitting next to me.

    This is a snakeskin jacket! And for me it's a symbol of my individuality, and my belief... in personal freedom.

  3. #3
    Join Date
    May 2009
    Location
    Boston
    Posts
    375

    Re: more questions about getline() getting a specific line

    Thanks, I have re-written the function to grab n sub-units instead of all of them. There are allot of parts of the code that think they are done when the tasks list is empty, so I have changed the function to return input_file.eof() so I can pick that up and use that to know when I am done. I am still having some issues getting it to loop back when it runs out of tasks, but at least it should start reading the file from the right place when I get that cleaned up.

    LMHmedchem

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured