Read binary file with line delimeter
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 1 of 11 1234 ... LastLast
Results 1 to 15 of 156

Thread: Read binary file with line delimeter

  1. #1
    Join Date
    Oct 2013
    Posts
    63

    Read binary file with line delimeter

    Hello to all,

    First post in this concurred forum. I hope someone could help me.

    I want to read a binary file using as line separator "ff77" in order to parse further each line one by one with some regex
    since the file is big. I have a small ruby code shown below, but I'm new in C++, and I don't know how to replicate in C++
    what this ruby code does.

    Code:
    #!/usr/bin/env ruby 
    
    BEGIN{  $/="\xff\x77" } # Line separator = FF77
    
    File.open(ARGV[0],"rb") # Open in binary mode
    
    # Process each line one by one
    while gets
        line = $_.unpack('H*')[0] #Storing the bytes for each line in "line "variable
        next unless line =~ /(..)(\d+)([A-B])/ # Regex with back-reference
        printf("%d %s %s\n",$1,$2,$3)  #Printing backreferenced patterns
    end
    I've been looking for a way to set the line delimeter and found getline function, but it seems getline only accepts one character
    and I need 4 characters as line separator.

    My attempt without success is below, it seems is not in that way.
    Code:
    #include <cstdlib>
    #include <fstream>
    
    int main() {
        std::ifstream input("C:\\binfile", ios::in | ios::binary);
    
        for( std::string line; getline( input, "ff77" ); )
    {
        printf("%s",line);
    }
        return 0;
    }
    Many thanks in advance for any help.

  2. #2
    Join Date
    Apr 1999
    Posts
    27,427

    Re: Read binary file with line delimeter

    Quote Originally Posted by Philidor View Post
    Hello to all,

    First post in this concurred forum. I hope someone could help me.

    I want to read a binary file using as line separator "ff77" in order to parse further each line one by one with some regex
    Opening a file in binary mode means that you're on your own and you get no help from C++ as to what is or are "end-of-line" character(s). That luxury goes to opening a file in text mode (and even that is limited).

    In other words, there is no such thing as a "line separator" to the C++ stream when you open a file in binary mode. You have to parse the line yourself with the knowledge of what is a "line separator".

    Regards,

    Paul McKenzie

  3. #3
    Join Date
    Oct 2013
    Posts
    63

    Re: Read binary file with line delimeter

    Hello Paul,

    Thanks for the answer. The term "line separator" I've used like a way to separate the data by blocks, since each block begins with begins with
    77 and ends with FF. So, when FF77 is found it means a new block begins.

    The issue is I don't know how to separate each block to parse it one at a time.

    Thanks in advance for any help.

  4. #4
    Join Date
    Apr 1999
    Posts
    27,427

    Re: Read binary file with line delimeter

    Well, how would you conceptually read a block of memory and look for delimeters within that block of memory, while reataining the text between the delimiters?

    Regards,

    Paul McKenzie

  5. #5
    Join Date
    Oct 2013
    Posts
    63

    Re: Read binary file with line delimeter

    Hello Paul,

    That is something similar to what I'm asking for help, I'm really a newbie in programming, the ruby code wasn't done by me.

    Maybe use an if statement to match ff 77 to know where begins a block. Maybe exists method more directly in C++,
    I don't know.

    Maybe you or somebody else could help me to be able to store each block in a variable to have the option to
    parse this string later.

    Thanks in advance for the help.

  6. #6
    Join Date
    Apr 1999
    Posts
    27,427

    Re: Read binary file with line delimeter

    Quote Originally Posted by Philidor View Post
    Hello Paul,

    That is something similar to what I'm asking for help, I'm really a newbie in programming, the ruby code wasn't done by me.
    Then you need someone already versed in C++ or programming in general to write this code. Or take the time to learn how to conceptualize a problem, write a plan on how to solve the problem using pencil and paper (no code), and then translate what you wrote to C++ code.
    Maybe use an if statement to match ff 77 to know where begins a block. Maybe exists method more directly in C++,
    There isn't one. C++ is not Ruby, and I think this was your initial mistake. You equated what you can do with Ruby in one or two lines of code, and hoped that C++ could do the same thing with similar effort. That is not the case.

    For C++, and really, any programming language you have to:

    1) Read a block into memory.
    2) Search the block of memory for your delimited string sequence.
    3) While doing this, retain where the text began and where the delimiter was found -- between these two points is the text.
    4) Save this text in some sort of container.
    5) Skip over the found delimiter, set the pointer to the characters after the delimiter, and repeat steps 2 through 5.
    ...
    Basically, it is a delimited file parser, with the delimiter equals "ff77". This is not trivial if you don't know how to write a program. Throw into the mix that you have to read the file in chunks, so you have to check to see if you read only enough to get a "partial line", and know that your next read will give you the rest of that line.
    Maybe you or somebody else could help me to be able to store each block in a variable to have the option to
    parse this string later.
    You want a comma-delimited file parser program or function (but allow the "comma" to be some other set of characters that delimits the text). That is as close as you can come to a "canned solution" in C++ (even though it isn't really canned, it's just that someone wrote the function to do so).

    Regards,

    Paul McKenzie

  7. #7
    Join Date
    Oct 2013
    Posts
    63

    Re: Read binary file with line delimeter

    Hello Paul,

    Thanks for the help.

    I've been able to do steps 2 to 4 and partially 5, since I'm don't know how to set the correct condition for the "while loop" to stops when any other delimiter is found in the current block of memory that is being read.

    What I've done is:
    Code:
     while (not end of current block of memory) { // This is the condition I don't know how could be
          x1 = curr_string.find("ff77",x2-1,4);      
          x2 = curr_string.find("ff77",x1+1,4);
          
          string temp=curr_string.substr(x1, x2 - x1); 
      }
    The condition I've tried is below, but I get infinite loop:
    Code:
    curr_string.find("ff77",x1+1,4)
    Thanks again.

  8. #8
    Join Date
    Apr 1999
    Posts
    27,427

    Re: Read binary file with line delimeter

    Quote Originally Posted by Philidor View Post
    Hello Paul,

    Thanks for the help.

    I've been able to do steps 2 to 4 and partially 5, since I'm don't know how to set the correct condition for the "while loop" to stops when any other delimiter is found in the current block of memory that is being read.
    You know how big the block is. The string variable has a size() argument.

    Why not start with something simple? Assume the file is comma delimited (a simple 1 character delimiter), and you had to extract the text between the commas. Forget about file, how about a simple hard-coded string:
    Code:
    #include <string>
    #include <vector>
    
    std::vector<std::string> getCommaFields(const std::string& commaStr)
    {
      //
    }
    
    
    int main()
    {
        std::vector<std::string> sVector;
        sVector = getCommaFields("Test1,Test2,This is test3");
    }
    The code is supposed to take that string, and extract the text that is between the commas. Each text is stored in the vector of strings and is returned. So on return, sVector must be the following:
    Code:
    sVector[0] = "Test1"
    sVector[1] = "Test2"
    sVector[2] = "This is a test3"
    If you can't write that function, at least to 95% completeness, then you should start here. Once you have it done, look at the code, and change it to try multiple character delimiters.

    Regards,

    Paul McKenzie

  9. #9
    Join Date
    Oct 2013
    Posts
    63

    Re: Read binary file with line delimeter

    Hello Paul,

    Thanks for the suggestion, I'll try to think how to get a function that works for this.

    One question, this way would be fine thinking that the real file I need to read is more than 2 GB? since I think if I'll need to read for example 1000 bytes and apply the code you suggests me or open the complete file, I don't know.

    Thanks again for the help.

  10. #10
    Join Date
    Apr 1999
    Posts
    27,427

    Re: Read binary file with line delimeter

    Quote Originally Posted by Philidor View Post
    One question, this way would be fine thinking that the real file I need to read is more than 2 GB? since I think if I'll need to read for example 1000 bytes and apply the code you suggests me or open the complete file, I don't know.
    What you would do is read (much more than) 1000 bytes into a buffer. Then you parse the buffer for the character sequence that terminates each line.

    The issue is that if your read straddles a line or the character sequence, which means that the next read of 1,000 bytes completes the string (or line terminator) and you have to take that into consideration.

    Regards,

    Paul McKenzie

  11. #11
    Join Date
    Oct 2013
    Posts
    63

    Re: Read binary file with line delimeter

    Quote Originally Posted by Paul McKenzie View Post
    What you would do is read (much more than) 1000 bytes into a buffer. Then you parse the buffer for the character sequence that terminates each line.

    The issue is that if your read straddles a line or the character sequence, which means that the next read of 1,000 bytes completes the string (or line terminator) and you have to take that into consideration.

    Regards,

    Paul McKenzie
    Hello Paul,

    Thanks for your reply, I'm taking your suggestions and I've been trying with the code below, the positions where commas ocurre are fine, but I get errors (Run exit value 1) to assing the substring to the V[i] (in red).

    I'm putting the condition "pos2<10000" because when a value is not found I receive the value 18446744073709551615.
    Code:
    #include <string>
    #include <vector>
    #include <iostream>
    
    using namespace std;
    
    vector<string> getCommaFields(const string& commaStr)
    {
     int i = 0;
     size_t pos1 = 1;
     size_t pos2 = 1;
     vector<string> V;
     string str=commaStr;
     while (pos2<10000) {
          pos1 = commaStr.find(",",pos2-1,1); 
          pos2 = commaStr.find(",",pos1+1,1);   
           
     //if (pos2<10000){
       // V[i]=commaStr.substr(pos1, pos2 - pos1); 
     //}
     cout<<pos1<<","<<pos2<<","<<str<<endl;  
     i++;
     }
    // return(V);
    }
    int main()
    {
        //const commaStr = "Test1,Test2,This is test3";
        vector<string> sVector;
        sVector = getCommaFields("Test1,Test2,Test3,Some text");
    }
    Thanks in advance for any help.

  12. #12
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,381

    Re: Read binary file with line delimeter

    That's because you haven't sized V, so initially V has no elements. Use push_back().

    Code:
    V.push_back(commaStr.substr(pos1, pos2 - pos1));
    I'm putting the condition "pos2<10000" because when a value is not found I receive the value 18446744073709551615.
    When no match is found for the find, it returns string::npos
    http://www.cplusplus.com/reference/string/string/find/

    There are also some logic errors (you only need 1 find in the while loop) but stepping through the code with the debugger and comparing the result with the function design should enable these to be found fairly easily.
    Last edited by 2kaud; October 9th, 2013 at 05:38 PM.
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  13. #13
    Join Date
    Apr 1999
    Posts
    27,427

    Re: Read binary file with line delimeter

    Quote Originally Posted by Philidor View Post
    Hello Paul,

    Thanks for your reply, I'm taking your suggestions and I've been trying with the code below, the positions where commas ocurre are fine, but I get errors (Run exit value 1) to assing the substring to the V[i] (in red).
    Well, one thing is that you should not assume your string is less than 10,000 characters.
    Code:
     while (pos2<10000) {
    The std::string has a size() function that returns you the number of characters. You should be using the value of size(), and not hard-code 10,000.
    I'm putting the condition "pos2<10000" because when a value is not found I receive the value 18446744073709551615.
    Always know what standard library functions will return:

    http://www.cplusplus.com/reference/string/string/find/

    Read the section on the return value when the string cannot be found.

    Regards,

    Paul McKenzie

  14. #14
    Join Date
    Oct 2013
    Posts
    63

    Re: Read binary file with line delimeter

    Hello 2kaud and Paul,

    Thanks for your help. I was able to do a function to return Vector elements as Paul said with comma delimiters and then I've changed to "FF77" and the code below it seems to work. The element "Test1" is not consider since in the real file the first characters shouldn't be consider, so that part is not incorrect.

    I deleted 1 find in the loop, maybe you can see if the code so far has some issues or something to improve.

    And besides any issue you can see that could be improved, I have 2 problems,
    1- I get exit value 1 using the 2 lines in red to get the position of last field separator.
    2- I wanted to replace with a variable the delimiter string, but for some reason the error says that is expected 2 parameters and provided 3 (this if I use the line in blue and replace "FF77" with Sep in all places).

    Code:
    #include <string>
    #include <vector>
    #include <iostream>
    
    using namespace std;
    
    vector<string> getFields(const string& FSepStr)
    {
     int i = 0;
     size_t pos = 1;
     size_t LastFS;
     vector<string> V;
     
    //string Sep = "FF77";
     while (FSepStr.find("FF77",pos+1,4)!=string::npos) {
          pos = FSepStr.find("FF77",FSepStr.find("FF77",pos+1,4)-1,4); 
           
          if (FSepStr.find("FF77",pos+1,4)!=string::npos){
            V.push_back(FSepStr.substr(pos+4, FSepStr.find("FF77",pos+1,4) - pos - 4));        
          } 
          i++;
     }
     return V;
    }
    int main()
    {
        const string InputStr = "Test1FF77Test2FF77Test3FF77Some textFF77other textFF772";
        vector<string> sVector;
        sVector = getFields(InputStr);
        //size_t LastFS = InputStr.rfind("FF77");
        
        for (int i=0;i<=sVector.size();i++){
            cout<<"V["<<i<<"]="<<sVector[i]<<endl;
        }
        //cout <<"Last FSep: "<<LastFS<<endl;
    }
    Output:
    Code:
    V[0]=Test2
    V[1]=Test3
    V[2]=Some text
    V[3]=other text
    
    RUN FAILED (exit value 1, total time: 90ms)
    Thanks again for the help.
    Last edited by Philidor; October 10th, 2013 at 12:47 AM.

  15. #15
    Join Date
    Apr 1999
    Posts
    27,427

    Re: Read binary file with line delimeter

    Code:
    for (int i=0; i<=sVector.size();i++)
    You are going beyond the bounds of the vector. Vectors (and arrays) in C++ start from 0 and go to n-1, where "n" is the number of elements. If that vector has 10 elements in it, you are erroneously going from 0 to 10 instead of 0 to 9. That's why you have a failure at the end of your program.

    Regards,

    Paul McKenzie

Page 1 of 11 1234 ... LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center