-
Read binary file with line delimeter
Hello to all,
First post in this concurred forum. I hope someone could help me.
I want to read a binary file using as line separator "ff77" in order to parse further each line one by one with some regex
since the file is big. I have a small ruby code shown below, but I'm new in C++, and I don't know how to replicate in C++
what this ruby code does.
Code:
#!/usr/bin/env ruby
BEGIN{ $/="\xff\x77" } # Line separator = FF77
File.open(ARGV[0],"rb") # Open in binary mode
# Process each line one by one
while gets
line = $_.unpack('H*')[0] #Storing the bytes for each line in "line "variable
next unless line =~ /(..)(\d+)([A-B])/ # Regex with back-reference
printf("%d %s %s\n",$1,$2,$3) #Printing backreferenced patterns
end
I've been looking for a way to set the line delimeter and found getline function, but it seems getline only accepts one character
and I need 4 characters as line separator.
My attempt without success is below, it seems is not in that way.
Code:
#include <cstdlib>
#include <fstream>
int main() {
std::ifstream input("C:\\binfile", ios::in | ios::binary);
for( std::string line; getline( input, "ff77" ); )
{
printf("%s",line);
}
return 0;
}
Many thanks in advance for any help.
-
Re: Read binary file with line delimeter
Quote:
Originally Posted by
Philidor
Hello to all,
First post in this concurred forum. I hope someone could help me.
I want to read a binary file using as line separator "ff77" in order to parse further each line one by one with some regex
Opening a file in binary mode means that you're on your own and you get no help from C++ as to what is or are "end-of-line" character(s). That luxury goes to opening a file in text mode (and even that is limited).
In other words, there is no such thing as a "line separator" to the C++ stream when you open a file in binary mode. You have to parse the line yourself with the knowledge of what is a "line separator".
Regards,
Paul McKenzie
-
Re: Read binary file with line delimeter
Hello Paul,
Thanks for the answer. The term "line separator" I've used like a way to separate the data by blocks, since each block begins with begins with
77 and ends with FF. So, when FF77 is found it means a new block begins.
The issue is I don't know how to separate each block to parse it one at a time.
Thanks in advance for any help.
-
Re: Read binary file with line delimeter
Well, how would you conceptually read a block of memory and look for delimeters within that block of memory, while reataining the text between the delimiters?
Regards,
Paul McKenzie
-
Re: Read binary file with line delimeter
Hello Paul,
That is something similar to what I'm asking for help, I'm really a newbie in programming, the ruby code wasn't done by me.
Maybe use an if statement to match ff 77 to know where begins a block. Maybe exists method more directly in C++,
I don't know.
Maybe you or somebody else could help me to be able to store each block in a variable to have the option to
parse this string later.
Thanks in advance for the help.
-
Re: Read binary file with line delimeter
Quote:
Originally Posted by
Philidor
Hello Paul,
That is something similar to what I'm asking for help, I'm really a newbie in programming, the ruby code wasn't done by me.
Then you need someone already versed in C++ or programming in general to write this code. Or take the time to learn how to conceptualize a problem, write a plan on how to solve the problem using pencil and paper (no code), and then translate what you wrote to C++ code.
Quote:
Maybe use an if statement to match ff 77 to know where begins a block. Maybe exists method more directly in C++,
There isn't one. C++ is not Ruby, and I think this was your initial mistake. You equated what you can do with Ruby in one or two lines of code, and hoped that C++ could do the same thing with similar effort. That is not the case.
For C++, and really, any programming language you have to:
1) Read a block into memory.
2) Search the block of memory for your delimited string sequence.
3) While doing this, retain where the text began and where the delimiter was found -- between these two points is the text.
4) Save this text in some sort of container.
5) Skip over the found delimiter, set the pointer to the characters after the delimiter, and repeat steps 2 through 5.
...
Basically, it is a delimited file parser, with the delimiter equals "ff77". This is not trivial if you don't know how to write a program. Throw into the mix that you have to read the file in chunks, so you have to check to see if you read only enough to get a "partial line", and know that your next read will give you the rest of that line.
Quote:
Maybe you or somebody else could help me to be able to store each block in a variable to have the option to
parse this string later.
You want a comma-delimited file parser program or function (but allow the "comma" to be some other set of characters that delimits the text). That is as close as you can come to a "canned solution" in C++ (even though it isn't really canned, it's just that someone wrote the function to do so).
Regards,
Paul McKenzie
-
Re: Read binary file with line delimeter
Hello Paul,
Thanks for the help.
I've been able to do steps 2 to 4 and partially 5, since I'm don't know how to set the correct condition for the "while loop" to stops when any other delimiter is found in the current block of memory that is being read.
What I've done is:
Code:
while (not end of current block of memory) { // This is the condition I don't know how could be
x1 = curr_string.find("ff77",x2-1,4);
x2 = curr_string.find("ff77",x1+1,4);
string temp=curr_string.substr(x1, x2 - x1);
}
The condition I've tried is below, but I get infinite loop:
Code:
curr_string.find("ff77",x1+1,4)
Thanks again.
-
Re: Read binary file with line delimeter
Quote:
Originally Posted by
Philidor
Hello Paul,
Thanks for the help.
I've been able to do steps 2 to 4 and partially 5, since I'm don't know how to set the correct condition for the "while loop" to stops when any other delimiter is found in the current block of memory that is being read.
You know how big the block is. The string variable has a size() argument.
Why not start with something simple? Assume the file is comma delimited (a simple 1 character delimiter), and you had to extract the text between the commas. Forget about file, how about a simple hard-coded string:
Code:
#include <string>
#include <vector>
std::vector<std::string> getCommaFields(const std::string& commaStr)
{
//
}
int main()
{
std::vector<std::string> sVector;
sVector = getCommaFields("Test1,Test2,This is test3");
}
The code is supposed to take that string, and extract the text that is between the commas. Each text is stored in the vector of strings and is returned. So on return, sVector must be the following:
Code:
sVector[0] = "Test1"
sVector[1] = "Test2"
sVector[2] = "This is a test3"
If you can't write that function, at least to 95% completeness, then you should start here. Once you have it done, look at the code, and change it to try multiple character delimiters.
Regards,
Paul McKenzie
-
Re: Read binary file with line delimeter
Hello Paul,
Thanks for the suggestion, I'll try to think how to get a function that works for this.
One question, this way would be fine thinking that the real file I need to read is more than 2 GB? since I think if I'll need to read for example 1000 bytes and apply the code you suggests me or open the complete file, I don't know.
Thanks again for the help.
-
Re: Read binary file with line delimeter
Quote:
Originally Posted by
Philidor
One question, this way would be fine thinking that the real file I need to read is more than 2 GB? since I think if I'll need to read for example 1000 bytes and apply the code you suggests me or open the complete file, I don't know.
What you would do is read (much more than) 1000 bytes into a buffer. Then you parse the buffer for the character sequence that terminates each line.
The issue is that if your read straddles a line or the character sequence, which means that the next read of 1,000 bytes completes the string (or line terminator) and you have to take that into consideration.
Regards,
Paul McKenzie
-
Re: Read binary file with line delimeter
Quote:
Originally Posted by
Paul McKenzie
What you would do is read (much more than) 1000 bytes into a buffer. Then you parse the buffer for the character sequence that terminates each line.
The issue is that if your read straddles a line or the character sequence, which means that the next read of 1,000 bytes completes the string (or line terminator) and you have to take that into consideration.
Regards,
Paul McKenzie
Hello Paul,
Thanks for your reply, I'm taking your suggestions and I've been trying with the code below, the positions where commas ocurre are fine, but I get errors (Run exit value 1) to assing the substring to the V[i] (in red).
I'm putting the condition "pos2<10000" because when a value is not found I receive the value 18446744073709551615.
Code:
#include <string>
#include <vector>
#include <iostream>
using namespace std;
vector<string> getCommaFields(const string& commaStr)
{
int i = 0;
size_t pos1 = 1;
size_t pos2 = 1;
vector<string> V;
string str=commaStr;
while (pos2<10000) {
pos1 = commaStr.find(",",pos2-1,1);
pos2 = commaStr.find(",",pos1+1,1);
//if (pos2<10000){
// V[i]=commaStr.substr(pos1, pos2 - pos1);
//}
cout<<pos1<<","<<pos2<<","<<str<<endl;
i++;
}
// return(V);
}
int main()
{
//const commaStr = "Test1,Test2,This is test3";
vector<string> sVector;
sVector = getCommaFields("Test1,Test2,Test3,Some text");
}
Thanks in advance for any help.
-
Re: Read binary file with line delimeter
That's because you haven't sized V, so initially V has no elements. Use push_back().
Code:
V.push_back(commaStr.substr(pos1, pos2 - pos1));
Quote:
I'm putting the condition "pos2<10000" because when a value is not found I receive the value 18446744073709551615.
When no match is found for the find, it returns string::npos
http://www.cplusplus.com/reference/string/string/find/
There are also some logic errors (you only need 1 find in the while loop) but stepping through the code with the debugger and comparing the result with the function design should enable these to be found fairly easily.
-
Re: Read binary file with line delimeter
Quote:
Originally Posted by
Philidor
Hello Paul,
Thanks for your reply, I'm taking your suggestions and I've been trying with the code below, the positions where commas ocurre are fine, but I get errors (Run exit value 1) to assing the substring to the V[i] (in red).
Well, one thing is that you should not assume your string is less than 10,000 characters.
Code:
while (pos2<10000) {
The std::string has a size() function that returns you the number of characters. You should be using the value of size(), and not hard-code 10,000.
Quote:
I'm putting the condition "pos2<10000" because when a value is not found I receive the value 18446744073709551615.
Always know what standard library functions will return:
http://www.cplusplus.com/reference/string/string/find/
Read the section on the return value when the string cannot be found.
Regards,
Paul McKenzie
-
Re: Read binary file with line delimeter
Hello 2kaud and Paul,
Thanks for your help. I was able to do a function to return Vector elements as Paul said with comma delimiters and then I've changed to "FF77" and the code below it seems to work. The element "Test1" is not consider since in the real file the first characters shouldn't be consider, so that part is not incorrect.
I deleted 1 find in the loop, maybe you can see if the code so far has some issues or something to improve.
And besides any issue you can see that could be improved, I have 2 problems,
1- I get exit value 1 using the 2 lines in red to get the position of last field separator.
2- I wanted to replace with a variable the delimiter string, but for some reason the error says that is expected 2 parameters and provided 3 (this if I use the line in blue and replace "FF77" with Sep in all places).
Code:
#include <string>
#include <vector>
#include <iostream>
using namespace std;
vector<string> getFields(const string& FSepStr)
{
int i = 0;
size_t pos = 1;
size_t LastFS;
vector<string> V;
//string Sep = "FF77";
while (FSepStr.find("FF77",pos+1,4)!=string::npos) {
pos = FSepStr.find("FF77",FSepStr.find("FF77",pos+1,4)-1,4);
if (FSepStr.find("FF77",pos+1,4)!=string::npos){
V.push_back(FSepStr.substr(pos+4, FSepStr.find("FF77",pos+1,4) - pos - 4));
}
i++;
}
return V;
}
int main()
{
const string InputStr = "Test1FF77Test2FF77Test3FF77Some textFF77other textFF772";
vector<string> sVector;
sVector = getFields(InputStr);
//size_t LastFS = InputStr.rfind("FF77");
for (int i=0;i<=sVector.size();i++){
cout<<"V["<<i<<"]="<<sVector[i]<<endl;
}
//cout <<"Last FSep: "<<LastFS<<endl;
}
Output:
Code:
V[0]=Test2
V[1]=Test3
V[2]=Some text
V[3]=other text
RUN FAILED (exit value 1, total time: 90ms)
Thanks again for the help.
-
Re: Read binary file with line delimeter
Code:
for (int i=0; i<=sVector.size();i++)
You are going beyond the bounds of the vector. Vectors (and arrays) in C++ start from 0 and go to n-1, where "n" is the number of elements. If that vector has 10 elements in it, you are erroneously going from 0 to 10 instead of 0 to 9. That's why you have a failure at the end of your program.
Regards,
Paul McKenzie
-
Re: Read binary file with line delimeter
Quote:
2- I wanted to replace with a variable the delimiter string, but for some reason the error says that is expected 2 parameters and provided 3 (this if I use the line in blue and replace "FF77" with Sep in all places).
Replace Sep in the find() with Sep.c_str()
-
Re: Read binary file with line delimeter
You can simply the function slightly and also make it more general so that it can be used to seperate fields given any delimeter. A possible way would be:
Code:
vector<string> getFields(const string& FSepStr, const string& Sep = "FF77");
vector<string> getFields(const string& FSepStr, const string& Sep)
{
size_t pos = 0;
size_t LastFS;
vector<string> V;
while ((LastFS = FSepStr.find(Sep.c_str(), pos + 1, Sep.size())) != string::npos)
if ((LastFS = FSepStr.find(Sep.c_str(), (pos = FSepStr.find(Sep.c_str(), LastFS - 1, Sep.size())) + 1, Sep.size())) != string::npos)
V.push_back(FSepStr.substr(pos + 4, LastFS - pos - Sep.size()));
return V;
}
so that you can specify the separators in the call to getFields if it is not "FF77".
PS You can do this function with just one .find in total rather than the 5 in your original code - as they say in all the best books, I'll leave that as an exercise!
-
Re: Read binary file with line delimeter
Hello Paul,
Thanks for the correction in for loop. I've changed.
Hello 2kaud,
Thanks for the suggestion to fix the Sep with find and for the simplification of the function, It took me some time to understand the if statement jeje.
In what I have issues now is how to return the position of last field separator to the main function. If I use the code as below I receive error, but If I delete the text in red, works fine.
I need to know what is the last position to know the offset that I need to put to read the next 1000 bytes.
By the way, which function can I use to read the binary file in chunks of 1000 bytes that let me put an offset and size of chunk? something like read(file, offset, 1000), so in the first "read" offset would be 0, and then, offset will be the value of LastPos up to the end.
Code:
#include <string>
#include <vector>
#include <iostream>
using namespace std;
vector<string> getFields(const string& FSepStr, const string& Sep = "FF77")
{
int i = 0;
size_t pos = 1;
size_t LastFS;
size_t LastPos;
vector<string> V;
while ((LastFS=FSepStr.find(Sep.c_str(),pos+1,Sep.size()))!=string::npos)
if ((LastFS=FSepStr.find(Sep.c_str(),(pos=FSepStr.find(Sep.c_str(),LastFS-1,Sep.size()))+1,Sep.size()))!=string::npos){
V.push_back(FSepStr.substr(pos+Sep.size(), LastFS - pos - Sep.size()));
LastPos=LastFS; //Storing position of last Field Separator
}
return V, LastPos;
}
int main()
{
const string InputStr = "Test1FF77Test2FF77Test3FF77Some textFF77other textFF772";
vector<string> sVector;
sVector = getFields(InputStr);
size_t LastPos;
for (int i=0;i<sVector.size();i++){
cout<<"V["<<i<<"]="<<sVector[i]<<endl;
}
//cout <<"Last FSep: "<<LastPos<<endl;
}
Thanks for the patience and great help!
Regards
-
Re: Read binary file with line delimeter
Quote:
Originally Posted by
Philidor
I need to know what is the last position to know the offset that I need to put to read the next 1000 bytes.
In C++, you can only return 1 entity. That statement with two values doesn't do what you think it does. It invokes the comma operator (do a google on this operator) -- this results in one value being returned.
If you want to return multiple values, read up on structs. Or in your case, since it is only two values, read up on std::pair.
Code:
#include <map>
#include <vector>
typedef std::vector<std::string> StringVector;
typedef std::pair<StringVector, size_t> ParseInfo;
ParseInfo getFields(const std::string& FSepStr, const std::string& Sep = "FF77")
{
ParseInfo pInfo;
StringVector& V = pInfo.first;
//...
pInfo.second = LastPos;
return pInfo;
}
The pair holds two items, first and second. So one entity is still being returned, but it contains two items.
Regards,
Paul McKenzie
-
Re: Read binary file with line delimeter
With just returning the last pos info, you'll have a problem with dealing with multiple blocks. Your function doesn't return the data from the start of the block to the first delimeter and the data from the end of the last delimeter to the end of the block. So you haven't got the data to concaternate the end of one block to the beginning of the next. Your function needs to return data before the first delimeter and data after last delimeter.
-
Re: Read binary file with line delimeter
This version of getFields returns also the data before the first delimeter and after the last delimeter (with just 1 find!). You don't need to return the position of last delimeter.
Code:
vector<string> getFields(const string& commaStr, const string& sep = "FF77");
vector<string> getFields(const string& commaStr, const string& sep)
{
vector<string> V;
size_t pos1;
for (size_t pos2 = 0; pos1 = pos2, (pos2 = commaStr.find(sep.c_str(), pos1, sep.size())) != string::npos; pos2 += sep.size())
V.push_back(commaStr.substr(pos1, pos2 - pos1));
V.push_back(commaStr.substr(pos1, commaStr.size() - pos1));
return V;
}
So all that needs to be done is to loop reading blocks until no more blocks. Keep the results of the current and previous blocks and append the first element of the current block to the last element of the previous block.
There is one problem, however. What happens if the delimeter spans two blocks? ie FF last char of one block and 77 first char of the next block? In this case this whole method won't work properly.
-
Re: Read binary file with line delimeter
Hello Paul and 2kaud,
Thanks for the help one more time.
@2kaud
Just fine the way with only 1 find, I really thougth how to do it, but I didn't get it. Regarding the function is fine that now get first and last element, that fixes many issues. Thank you.
But imho, to fix when delimeters apans 2 block, I need to store the position of last delimiter, so in that way I could use that position as origin (offset) of the next block of 1000 bytes to read.
Paul and 2kaud,
In this way, I'm trying to combine Paul's suggestion with 2kaud's last function to get position of last delimiterr, but it seems I'm not doing it in correct way. I get errors in all lines in red.
Code:
#include <map>
#include <string>
#include <vector>
#include <iostream>
using namespace std;
typedef vector<string> sVector;
typedef pair<sVector, size_t> ParseInfo;
ParseInfo getFields(const string& FSepStr, const string& sep = "FF77")
{
ParseInfo pInfo;
typedef vector<string> V;
sVector& V = pInfo.first;
size_t pos1;
for (size_t pos2 = 0; pos1 = pos2, (pos2 = FSepStr.find(sep.c_str(), pos1, sep.size())) != string::npos; pos2 += sep.size()){
V.push_back(FSepStr.substr(pos1, pos2 - pos1));
pInfo.second = pos2;
}
V.push_back(FSepStr.substr(pos1, FSepStr.size() - pos1)); //Return element after last FSep
return pInfo;
}
int main()
{
const string InputStr = "Test1FF77Test2FF77Test3FF77Some textFF77other textFF7";
vector<string> sVector;
sVector = getFields(InputStr);
for (int i=0;i<sVector.size();i++){
cout<<"V["<<i<<"]="<<sVector[i]<<endl;
}
//cout <<"Last FSep: "<<LastPos<<endl;
}
Thanks again for help so far
-
Re: Read binary file with line delimeter
This is how you might do it
Code:
#include <string>
#include <vector>
#include <iostream>
using namespace std;
typedef vector<string> sVector;
typedef pair<sVector, size_t> ParseInfo;
ParseInfo getFields1(const string& FSepStr, const string& sep = "FF77")
{
size_t pos1;
sVector V;
ParseInfo pInfo;
for (size_t pos2 = 0; pos1 = pos2, (pos2 = FSepStr.find(sep.c_str(), pos1, sep.size())) != string::npos; pos2 += sep.size())
V.push_back(FSepStr.substr(pos1, pos2 - pos1));
V.push_back(FSepStr.substr(pos1, FSepStr.size() - pos1)); //Return element after last FSep
pInfo.first = V;
pInfo.second = pos1;
return pInfo;
}
int main()
{
const string InputStr = "Test1FF77Test2FF77Test3FF77Some textFF77other textFF7";
ParseInfo pi;
sVector sv;
pi = getFields1(InputStr);
for (int i = 0; i < pi.first.size(); i++){
cout << "V[" << i << "]=" << pi.first[i] << endl;
}
cout << "Last FSep: " << pi.second << endl;
return 0;
}
It returns the position of the first char past the end of the last full delimeter. Using your test string, this outputs
Code:
V[0]=Test1
V[1]=Test2
V[2]=Test3
V[3]=Some text
V[4]=other textFF7
Last FSep: 40
The problem is v[4]. This contains FF7 at the end. It might be any combination of F, FF or FF7. The only way you're going to know if this is part of a delimeter or some other valid text is to parse v[4] together with v[0] of the next block, which might start ith F77, 77 or 7.
-
Re: Read binary file with line delimeter
You don't need to return the position of the last delimeter. A possible way of parsing the blocks is
Code:
#include <string>
#include <vector>
#include <iostream>
using namespace std;
typedef vector<string> sVector;
sVector getFields(const string& FSepStr, const string& sep = "FF77")
{
size_t pos1;
sVector V;
for (size_t pos2 = 0; pos1 = pos2, (pos2 = FSepStr.find(sep.c_str(), pos1, sep.size())) != string::npos; pos2 += sep.size())
V.push_back(FSepStr.substr(pos1, pos2 - pos1));
V.push_back(FSepStr.substr(pos1, FSepStr.size() - pos1)); //Return element after last FSep
return V;
}
void output(const sVector& sv)
{
static int cnt = 0;
for (size_t i = 0; i < sv.size(); i++)
cout << "V[" << cnt++ << "]=" << sv[i] << endl;
}
int main()
{
const string InputStr1 = "Test1FF77Test2FF77Test3FF77Some textFF77other textFF";
const string InputStr2 = "77Test4FF77Test5FF77Test6FF77Some text7FF77other text8";
sVector sv1,
sv2;
string elem;
sv1 = getFields(InputStr1);
sv2 = ((elem = sv1[sv1.size() - 1]) != "") ? getFields(elem + InputStr2) : getFields(InputStr2);
sv1.pop_back();
output(sv1);
output(sv2);
return 0;
}
This gives the output
Code:
V[0]=Test1
V[1]=Test2
V[2]=Test3
V[3]=Some text
V[4]=other text
V[5]=Test4
V[6]=Test5
V[7]=Test6
V[8]=Some text7
V[9]=other text8
which is as required.
-
Re: Read binary file with line delimeter
This is a possible way to process multiple blocks
Code:
int main()
{
sVector blocks;
blocks.push_back("Test1FF77Test2FF77Test3FF77Test4FF77Test5FF");
blocks.push_back("77Test6FF77Test7FF77Test8FF77Test9FF77Test10F");
blocks.push_back("F77Test11FF77Test12FF77Test13FF77Test14FF77Test15FF7");
blocks.push_back("7Test16FF77Test17FF77Test18FF77Test19FF77Test20");
blocks.push_back("FF77Test21FF77Test22FF77Test23FF77Test24FF77Test25FF77");
blocks.push_back("Test26FF77Test27FF77Test28FF77Test29FF77Test30");
string elem;
sVector sv1 = getFields(blocks[0]);
for (size_t b = 1; b < blocks.size(); b++) {
sVector sv2 = ((elem = sv1[sv1.size() - 1]) != "") ? getFields(elem + blocks[b]) : getFields(blocks[b]);
sv1.pop_back();
output (sv1);
sv1 = sv2;
}
output(sv1);
return 0;
}
Producing the output
Code:
V[0]=Test1
V[1]=Test2
V[2]=Test3
V[3]=Test4
V[4]=Test5
V[5]=Test6
V[6]=Test7
V[7]=Test8
V[8]=Test9
V[9]=Test10
V[10]=Test11
V[11]=Test12
V[12]=Test13
V[13]=Test14
V[14]=Test15
V[15]=Test16
V[16]=Test17
V[17]=Test18
V[18]=Test19
V[19]=Test20
V[20]=Test21
V[21]=Test22
V[22]=Test23
V[23]=Test24
V[24]=Test25
V[25]=Test26
V[26]=Test27
V[27]=Test28
V[28]=Test29
V[29]=Test30
-
Re: Read binary file with line delimeter
As you are going to parse blocks, the code below may be of interest. It assumes you have a function that gets a block.
Code:
//get a block to parse
//returns true if block got, false if not got
bool getBlock(string& block)
{
sVector blocks;
blocks.push_back("Test1FF77Test2FF77Test3FF77Test4FF77Test5FF");
blocks.push_back("77Test6FF77Test7FF77Test8FF77Test9FF77Test10F");
blocks.push_back("F77Test11FF77Test12FF77Test13FF77Test14FF77Test15FF7");
blocks.push_back("7Test16FF77Test17FF77Test18FF77Test19FF77Test20");
blocks.push_back("FF77Test21FF77Test22FF77Test23FF77Test24FF77Test25FF77");
blocks.push_back("Test26FF77Test27FF77Test28FF77Test29FF77Test");
blocks.push_back("30FF77Test31");
static int blkno = 0;
if (blkno < blocks.size()) {
block = blocks[blkno++];
return true;
}
block = "";
return false;
}
int main()
{
string block;
sVector sv1,
sv2;
bool got;
for (got = getBlock(block), sv1 = getFields(block); got; sv1 = sv2) {
got = getBlock(block);
sv2 = getFields(sv1[sv1.size() - 1] + block);
sv1.pop_back();
output(sv1);
}
output(sv1);
return 0;
}
-
Re: Read binary file with line delimeter
Hello 2kaud,
Many thanks for the time and help.
I've been trying to test your code, but I get 2 errors for lines in red.
Code:
#include <string>
#include <vector>
#include <iostream>
using namespace std;
typedef vector<string> sVector;
sVector getFields(const string& FSepStr, const string& sep = "FF77")
{
size_t pos1;
sVector V;
for (size_t pos2 = 0; pos1 = pos2, (pos2 = FSepStr.find(sep.c_str(), pos1, sep.size())) != string::npos; pos2 += sep.size())
V.push_back(FSepStr.substr(pos1, pos2 - pos1));
V.push_back(FSepStr.substr(pos1, FSepStr.size() - pos1)); //Return element after last FSep
return V;
}
void output(const sVector& sv)
{
static cnt = 0; //Error: 'cnt' does not name a type
for (size_t i = 0; i < sv.size(); i++)
cout << "V[" << cnt++ << "]=" << sv[i] << endl; //Error: 'cnt' was not declared in this scope
}
//get a block to parse
//returns true if block got, false if not got
bool getBlock(string& block)
{
sVector blocks;
blocks.push_back("Test1FF77Test2FF77Test3FF77Test4FF77Test5FF");
blocks.push_back("77Test6FF77Test7FF77Test8FF77Test9FF77Test10F");
blocks.push_back("F77Test11FF77Test12FF77Test13FF77Test14FF77Test15FF7");
blocks.push_back("7Test16FF77Test17FF77Test18FF77Test19FF77Test20");
blocks.push_back("FF77Test21FF77Test22FF77Test23FF77Test24FF77Test25FF77");
blocks.push_back("Test26FF77Test27FF77Test28FF77Test29FF77Test");
blocks.push_back("30FF77Test31");
static int blkno = 0;
if (blkno < blocks.size()) {
block = blocks[blkno++];
return true;
}
block = "";
return false;
}
int main()
{
string block;
sVector sv1,
sv2;
bool got;
for (got = getBlock(block), sv1 = getFields(block); got; sv1 = sv2) {
got = getBlock(block);
sv2 = getFields(sv1[sv1.size() - 1] + block);
sv1.pop_back();
output(sv1);
}
output(sv1);
return 0;
}
Thanks again
-
Re: Read binary file with line delimeter
-
Re: Read binary file with line delimeter
Thanks so much 2kaud, I was able to test it now and it works just fine.
I'd like to test it with real blocks now, what do you suggest me to read the binary file in chunks of 1000 bytes and be able to
insert an offset?
Would be something like this?
Code:
ifstream file ("binfile", ios::in|ios::binary|ios::ate);
file.read (block, size);
file.seekg (offset, ios::beg);
file.close();
But I'm not sure if only 1000 bytes would be in memory in any moment, since I'd like to avoid to load in memory the complete binary file due to its 2GB size.
Thanks again for all the help.
-
Re: Read binary file with line delimeter
Try this for getBlock.
Code:
bool getBlock(string& block)
{
static ifstream ifs("binfile", ios::binary);
char buf[1001];
if (!ifs.is_open()) {
block = "";
return false;
}
ifs.read(buf, 1000);
buf[ifs.gcount()] = 0;
if (ifs.gcount() > 0) {
block = buf;
return true;
}
ifs.close();
block = "";
return false;
}
-
Re: Read binary file with line delimeter
Thanks one more time 2kaud.
I've tested, compiles without errors, but is not printing the content of the blocks, I only receive this.
Code:
V[0]=vEJXXYW1_D1211308071344S dE‰4Uÿw
This is the interpreted content before the first delimiter "ff77". I'm not sure what is happens. Maybe is not reading literally the bytes and because of that is not finding the FF77.
When I see in an hex editor, the first 64 bytes are the following:
Code:
76454a58585957315f44313231313330
38303731333434065320644501893455
ff7700000153206445018934550f8147
4549232fffff0015000a4800015a0002
-
Re: Read binary file with line delimeter
Can you zip the first few thousand bytes of the file and post so I can take a look at it. I tried it on a test text file here I created that was over 1000 bytes and it showed the expected output.
-
1 Attachment(s)
Re: Read binary file with line delimeter
The attached file has 5 delimiters "FF77", so, 5 blocks.
I want to have stored in block variable the bytes literally, without any convertion to ascii.
The first 1024 bytes are below, showing the content of the first block and partially the content of 2nd block.
Code:
76454a58585957315f4431323131333038303731333434065320644501893455
ff7700000153206445018934550f81474549232fffff0015000a4800015a0002
4200016000013300013600013700015b00017e00016900006a00007900009300
012200002100010900010a000126000102000104000105000106000110000108
00012b00002c00012d00012e00015500015600072a00002f0000300000310000
ff7900800932c90688888000a000800935c90600008000000080093cc9068888
80008000800943c9068888800080000582003706010000010065000000020000
0200180000000300000300170000000400000400010000000500000500150000
000a00ffff006500000007802ec918059181475269531fffffff009181475269
531fffff000103ca030808fecb0a00000000000000000000cc0101811bc90b00
9181475269567fffffffca06000000000000cb0103cc0101ff77000002532064
45018934551f81474554768fffff0015000a4800015a00024200016000013300
013600013700015b00016600016500017700017800017e00016900006a000079
00009300012200002100010900010a0001260001020001040001050001060001
1000010800012b00002c00012d00012e00015500015600072a00002f00003000
00310000ff7900800932c90688888000a000800935c90600008000000080093c
PS:I've put txt extension to be able to upload the file.
Thanks for the help.
-
Re: Read binary file with line delimeter
The delimeter is NOT "FF77". The delimeter is 0xFF77 - which is a different ball game! Also the data contains 0x00 which is usually used to indicate the end of a string. Basically the functions work with ASCII text not binary data. That's why they work fine with the test data but not with your actual binary file.
I'll have a look at the functions over the next few days.
-
Re: Read binary file with line delimeter
Try this.
Code:
#include <iostream>
#include <fstream>
#include <vector>
#include <iomanip>
using namespace std;
typedef unsigned char BYTE;
typedef unsigned short int WORD;
typedef vector<BYTE> bVec;
#ifndef LOBYTE
#define LOBYTE(w) ((BYTE)((WORD)(w) & 0xff))
#endif
#ifndef HIBYTE
#define HIBYTE(w) ((BYTE)((WORD)(w) >> 8))
#endif
class FileFields
{
private:
ifstream ifs;
bool opened;
public:
FileFields() : opened(false) {}
~FileFields() {
if (opened)
ifs.close();
}
bool open(const char* name);
bool getField(bVec& field, WORD delim = 0xFF77);
};
bool FileFields::open(const char* name) {
ifs.open(name, ios::binary);
return (opened = ifs.is_open());
}
bool FileFields::getField(bVec& field, WORD delim)
{
char by;
bool cont = true;
field.clear();
if (!opened || !ifs.good())
return false;
for (ifs.get(by); cont && ifs.gcount(); ifs.get(by)) {
if (by == HIBYTE(delim))
if (ifs.peek() == LOBYTE(delim))
cont = false;
if (cont)
field.push_back(by);
}
return true;
}
void display(const bVec bv)
{
for (int i = 0; i < bv.size(); i++)
cout << setw(2) << setfill('0') << hex << (int)bv[i];
cout << endl << endl;
}
int main()
{
FileFields ff;
bVec bv;
if (!ff.open("binary.txt")) {
cout << "Cannot open file!" << endl;
return 1;
}
while (ff.getField(bv))
display(bv);
return 0;
}
-
Re: Read binary file with line delimeter
Hello 2kaud again,
Thanks for this help really. I'm testing and the code prints content of the binary correctly.
Now, I'm trying to print similarly each block, like block[0], block[1] etc like before, but I tried in dispaly function,
but I see now that function handles one byte at a time.
How can I print each block separately?
Thanks for your help, I'm trying to understand the way how you do it.
-
Re: Read binary file with line delimeter
Display prints one block separated by line feeds. In the main program, the vector bv holds one block (data between 2 delimeters) from the file. So you can use the contents of the vector bv to process one block at a time. The class function getField returns a vector containing the contents of the next block each time it is called.
If you print to print a block heading before each block, try
Code:
int main()
{
FileFields ff;
bVec bv;
int blk = 0;
if (!ff.open("binary.txt")) {
cout << "Cannot open file!" << endl;
return 1;
}
while (ff.getField(bv)) {
cout << "block " << blk++ << endl;
display(bv);
}
return 0;
}
-
Re: Read binary file with line delimeter
Hello 2kaud,
I'm not sure what happens, but trying in that way I receive "block 0" and in the next file the content of complete binary file. So, it seems like all content of the file is contained in block 0.
Code:
block 0
76454a58585957315f4431323131333038303731333434065320644501893455ff7700000......
I don't see clearly yet how to print something like this:
Code:
block 0
76454a58585957315f4431323131333038303731333434065320644501893455
block 1
000001...
block 2
000002...
block 3
000003...
.
.
block N
00000N...
Thanks again for the help.
-
Re: Read binary file with line delimeter
From the partial data file you attached in post #33, the output I get from this program is
Code:
block 0
76454a58585957315f4431323131333038303731333434065320644501893455
block 1
00000153206445018934550f81474549232fffff0015000a4800015a0002420001
00013700015b00017e00016900006a00007900009300012200002100010900010a
010400010500010600011000010800012b00002c00012d00012e00015500015600
300000310000ff7900800932c90688888000a000800935c9060000800000008009
8000800943c9068888800080000582003706010000010065000000020000020018
00170000000400000400010000000500000500150000000a00ffff006500000007
475269531fffffff009181475269531fffff000103ca030808fecb0a0000000000
01811bc90b009181475269567fffffffca06000000000000cb0103cc0101
block 2
00000253206445018934551f81474554768fffff0015000a4800015a0002420001
00013700015b00016600016500017700017800017e00016900006a000079000093
010900010a00012600010200010400010500010600011000010800012b00002c00
5500015600072a00002f0000300000310000ff7900800932c90688888000a00080
00000080093cc906888880008000800943c90688888000800005900f0102000000
ffff00910f01020000013a81475269559fffff009310010c0000009f8147526905
010e000000eb81475269596fffff00970f01010006f69981475269563fffff0094
0100ffff0000010195060003790001ea0582003706010000010065000000020000
00000300170000000400000400010000000500000500150000000a00ffff006500
009181475269539fffffff009181475269539fffff000103ca030808fecb0a0000
00cc0101811bc90b009181475269567fffffffca06000000000000cb0103cc0101
What compiler are you using? I use MSVC.
Try replacing
Code:
if (by == HIBYTE(delim))
if (ifs.peek() == LOBYTE(delim))
with
Code:
if ((BYTE)by == 0xff)
if ((BYTE)ifs.peek() == 0x77)
Is your type char default signed or unsigned? I have default unsigned. If yours is signed, that may be the problem.
UPDATE That is probably the problem. I changed my default to signed and got the same issue you have. Putting (BYTE) in the 2 if statements makes it work if default char is signed.
-
Re: Read binary file with line delimeter
Hello 2kaud,
Thanks!!!
I'm using MinGW and Dev C++(TDM GCC) compilers, both with the same issue, but you're rigth, replacing the 2 if with (BYTE) makes it work! why?
In main function I had to add "\n" in order to print in different row as below, I'm not sure if you needed to do that to get the output you posted.
Code:
cout << "\nblock " << blk++ << endl;
Code:
block 0
76454a58585957315f4431323131333038303731333434065320644501893455
block 1
00000153206445018934550f81474549232fffff0015000a4800015a00024200
016000013300013600013700015b00017e00016900006a000079000093000122
00002100010900010a0001260001020001040001050001060001100001080001
2b00002c00012d00012e00015500015600072a00002f0000300000310000ff79
00800932c90688888000a000800935c90600008000000080093cc90688888000
8000800943c90688888000800005820037060100000100650000000200000200
180000000300000300170000000400000400010000000500000500150000000a
00ffff006500000007802ec918059181475269531fffffff009181475269531f
ffff000103ca030808fecb0a00000000000000000000cc0101811bc90b009181
475269567fffffffca06000000000000cb0103cc0101
block 2
00000253206445018934551f81474554768fffff0015000a4800015a00024200
016000013300013600013700015b00016600016500017700017800017e000169
00006a00007900009300012200002100010900010a0001260001020001040001
0500010600011000010800012b00002c00012d00012e00015500015600072a00
002f0000300000310000ff7900800932c90688888000a000800935c906000080
00000080093cc906888880008000800943c90688888000800005900f01020000
00308147526905ffffff00910f01020000013a81475269559fffff009310010c
0000009f8147526905ffffff0101960f010e000000eb81475269596fffff0097
0f01010006f69981475269563fffff00940e0001000001000100ffff00000101
95060003790001ea058200370601000001006500000002000002001800000003
00000300170000000400000400010000000500000500150000000a00ffff0065
00000007802ec918009181475269539fffffff009181475269539fffff000103
ca030808fecb0a00000000000000000000cc0101811bc90b009181475269567f
ffffffca06000000000000cb0103cc0101
block 3
Question:
Does the code open the complete file in memory or only read block by block?
Thanks in advance.
-
Re: Read binary file with line delimeter
Quote:
In main function I had to add "\n" in order to print in different row as below, I'm not sure if you need to do that to get the out you posted.
In function display, have you got the final cout << endl << endl; statement? I get blank lines between my blocks.
Quote:
replacing the 2 if with (BYTE) makes it work! why?
Because if char is typed signed, it can never equal the value 0xff as any value over 127 (0x7f) is treated as negative, so a signed value compared to 0xff is always false! Using the (BYTE) cast, treats it as an unsigned number which can then be compared successfully to 0xff.
-
Re: Read binary file with line delimeter
Thanks 2kaud for explanation. I undertand better.
Now I see that each block could be processed within the while loop!
Question:
Does the code open the complete file in memory or only read block by block?
Now, I only need to add the function of process that will parse each block within the while loop rigth?
I'd like to parse each block using regex.
Thanks for all help!
-
Re: Read binary file with line delimeter
I would've taken a more generic approach, utilizing more function parameters instead of hardcoding everything in there...
Code:
void get_fields(const std::string& s, const std::string& find_str, std::vector<std::string>& vec)
{
if (s.find(find_str, 0) == std::string::npos) { vec.push_back(s); return; }
size_t index[2] = { 0, 0 };
const size_t len = find_str.length();
do
{
index[1] = s.find(find_str, index[0]);
if (!(x = s.substr(index[0], index[1] - index[0])).empty()) vec.push_back(x);
index[0] = index[1] + len;
} while (index[1] != std::string::npos);
}
int main(void)
{
const std::string find_str("FF77");
const std::string s("Test1FF77Test2FF77Test3FF77Some textFF77other textFF772");
std::vector<std::string> v;
get_fields(s, find_str, v);
for (unsigned i = 0; i < v.size(); i++)
std::cout << "v[" << i << "]=" << v[i] << std::endl;
return 0;
}
*edit: Wow, didn't even notice all of the other replies on the last page. I'll need to do more reading.
-
Re: Read binary file with line delimeter
Quote:
Originally Posted by
AceInfinity
I would've taken a more generic approach, utilizing more function parameters instead of hardcoding everything in there...
Code:
void get_fields(const std::string& s, const std::string& find_str, std::vector<std::string>& vec)
{
if (s.find(find_str, 0) == std::string::npos) { vec.push_back(s); return; }
size_t index[2] = { 0, 0 };
const size_t len = find_str.length();
do
{
index[1] = s.find(find_str, index[0]);
if (!(x = s.substr(index[0], index[1] - index[0])).empty()) vec.push_back(x);
index[0] = index[1] + len;
} while (index[1] != std::string::npos);
}
int main(void)
{
const std::string find_str("FF77");
const std::string s("Test1FF77Test2FF77Test3FF77Some textFF77other textFF772");
std::vector<std::string> v;
get_fields(s, find_str, v);
for (unsigned i = 0; i < v.size(); i++)
std::cout << "v[" << i << "]=" << v[i] << std::endl;
return 0;
}
*edit: Wow, didn't even notice all of the other replies on the last page. I'll need to do more reading.
The delimeter as stated in my post #34 is 0xff77 and NOT "FF77". On previous pages we've already covered using string find which is not what is required when working with binary files.
-
Re: Read binary file with line delimeter
Quote:
Originally Posted by
2kaud
The delimeter as stated in my post #34 is 0xff77 and NOT "FF77". On previous pages we've already covered using string find which is not what is required when working with binary files.
No need for the criticism here. I don't see how hard it could be for somebody to change the function parameters around to be honest, although, my post served as an example... Perhaps you have neglected to read my edit? ;)
-
Re: Read binary file with line delimeter
Quote:
Now, I only need to add the function of process that will parse each block within the while loop rigth?
I'd like to parse each block using regex.
What parsing of the block do you require?
-
Re: Read binary file with line delimeter
Hello AceInfinity,
Thanks for try to help me too. I'll test your code either.
Hello 2kaud,
For each block I'd like first to extract some patterns with 2 regex below.
Regex1: (.{6,18})(532064[^f]*).(814[^f]*)
Regex2: ff79(0080.{4}c906)?.*?05(9.{32,34}.*?)940e(.{28})
Regex1 matches string in red
Regex2 matches string in blue
Code:
00000253206445018934551f81474554768fffff0015000a4800015a00024200016000013300013600013700015b00016600
016500017700017800017e00016900006a00007900009300012200002100010900010a001260001020001040001050001060
0011000010800012b00002c00012d00012e00015500015600072a00002f0000300000310000ff7900800932c90688888000a
000800935c90600008000000080093cc90688888000000800943c90688888000800005900f0102000000308147526905ffff
ff00910f01020000013a81475269559fffff009310010c0000009f8147526905ffffff0101960f010e000000eb8147526959
6fffff00970f0100006f69981475269563fffff00940e0001000001000100ffff0000010195060003790001ea05820037060
10000010065000000020000020018000000030000030017000000040000040001000000050000050015000000a00ffff0065
00000007802ec918009181475269539fffffff009181475269539fffff000103ca030808fecb0a00000000000000000000cc
0101811bc90b009181475269567fffffffca06000000000000cb0103cc0101
Regex1 works to extract data from the beginning of the block, for block 2 Regex1 would get:
Code:
00000253206445018934551f81474554768
but since I'm grouping with backreference, this string would be splitted like this
Code:
000002,53206445018934551,81474554768
to finally treat as hex the first group and print as decimal, leaving group 2 and group 3 without change as below.
Code:
2,53206445018934551,81474554768
For Regex2, I'm using backreference too, since I only want to extract the group 2 and 3 from Regex2. But the
parsing of string got from Regex 2 is more complicated. So, Initially I'd like to know how to apply regex on each
block.
Many thanks for the help
-
Re: Read binary file with line delimeter
-
Re: Read binary file with line delimeter
Hello 2kaud,
Thanks for that.
I've tried to add <regex> header but I receive error. I've read that maybe with visual studio compiler could work.
I've downloaded and installed a visual C++ 2012 compiler version but I'm no able to make that works with netbeans nor Dev C++.
Does accepts for you the regex header? Which IDE and compiler are you using?
Thanks in advance
-
Re: Read binary file with line delimeter
I'm using VS 2013 and am able to use the <regex> header with this setup. Dev C++ is old (assuming you're using bloodshed's, and not orwells), and if so, you should abandon it.