Read binary file with line delimeter

Re: Read binary file with line delimeter

So if there is no 940E sequence at the end, you ignore the sub blocks? Is that right?

Re: Read binary file with line delimeter

Quote:

Originally Posted by 2kaud

So if there is no 940E sequence at the end, you ignore the sub blocks? Is that right?

Yes 2kaud. If is not present the complete conditions, then the string doesn't qualify to be a sub-block.

It should contain 059X +.. +940E+14 bytes

Re: Read binary file with line delimeter

Try this

Code:

int main() { FileFields ff; //if (!ff.open("d:\\philidor\\bin2g")) { if (!ff.open("d:\\philidor\\binsmall")) { cout << "Cannot open file!" << endl; return 1; } string header; ff.getField(header); string block; block.reserve(7000); string preliminar; preliminar.reserve(7000); string cx; cx.reserve(7000); string sub; sub.reserve(7000); DWORD number; time_t timest = time(NULL); for (DWORD blk = 1; ff.getBlock(block, number, preliminar); blk++) { size_t ff79; if ((ff79 = block.find(SBLOCK)) != string::npos) { size_t five; if ((five = block.find("05", ff79)) != string::npos) { cx = block.substr(five + 2); sub = ""; bool got4 = false; for (size_t c = 0; c < cx.size() && !got4; c+= 2) if (cx[c] == '9' && (cx[c + 1] >= '0' && cx[c + 1] <= '7' /*&& cx[c + 1] != '5'*/)) { int slen = (convh[(cx[c + 2] - '0')] * 16 + convh[cx[c + 3] - '0']) * 2 + 4; if (cx[c + 1] != '5') sub += cx.substr(c, slen) + '|'; got4 = (cx[c + 1] == '4'); c += slen - 2; } if (got4) preliminar += sub; } } cout << number << preliminar << endl; } cout << "Time taken: " << time(NULL) - timest << endl; return 0; }

Re: Read binary file with line delimeter

Hello 2kaud,

I've tested your last code and it works, it extracts all substrings expected.

I found that each substring that begins with 9X... could take 2 more values.

9X, where X=0,1,2,3,6,7,A,B

So, could begin with 9A and 9B too. I think this could be a problem since are not decimal numbers.

Now, I think is the more difficult part. I hope explain well

Each substring is composed like this:

When second byte is 0F (15 bytes), then is like this:
1 byte + 1 byte + 1 byte + 1 byte + 4 bytes + 8 bytes + 1 byte
90-0F-01-02-00000030-8147526905FFFFFF-00

When second byte is 10 (16 bytes), then is like this:
1 byte + 1 byte + 1 byte + 1 byte + 4 bytes + 8 bytes + 1 byte + 1 byte
93-10-01-0C-0000000D-8147526905FFFFFF-01-01

And I'd like to print from byte 4 to the end, converting each group of bytes to decimal, except the group of 8 bytes that should be printed without convertion but without "f's". Then for the 2 examples substring the print would be.

For first sample substring:

Code:

900F0102000000308147526905FFFFFF00 --> original substring 90-0F-01-02-00000030-8147526905FFFFFF-00 --> separating in groups 02-00000030-8147526905FFFFFF-00 --> These are the groups I want to print 2,48,8147526905,0 --> separated in commas, but in decimal except the section of 8 bytes that only is needed to remove the "f´s".

For 2nd sample substring:

Code:

9310010C0000000D8147526905FFFFFF0101 --> original substring 93-10-01-0C-0000000D-8147526905FFFFFF-01-01 --> separating in groups 0C-0000000D-8147526905FFFFFF-01-01--> These are the groups I want to print 12,13,8147526905,1,1 --> separated in commas, but in decimal except the section of 8 bytes that only is needed to remove the "f´s".

And when substring is the last substring, the one that begins with 940E + 14 bytes, I want to print each individual byte of those 14 bytes, in decimal

Code:

940E0001000001000100FFFF00000101 --> original 940E-00-01-00-00-01-00-01-00-FF-FF-00-00-01-01 --> Composed by 14 bytes 00-01-00-00-01-00-01-00-FF-FF-00-00-01-01 --> These 14 bytes I want to print 0,1,0,0,1,0,1,0,255,255,0,0,1,1 --> separated in commas, but in decimal

Then, currently the output with your last code using the binSmall file is:

Code:

65398|532064019659172|81440415264|900F0102000000308147526905FFFFFF00|910F01020000013A81475269559FFFFF00|9310010C0000009F8147526905FFFFFF0101|960F010E000000EB81475269596FFFFF00|970F01010006F69981475269563FFFFF00|940E0001000001000100FFFF00000101| 65399|532064024496121|81440415265| 65400|532064019659174|81440415266| 65401|532064019659175|81440415267|910F01020000000D8147526905FFFFFF00|9310010C0000000D8147526905FFFFFF0101|960F010C0000000D81475269565FFFFF00|940E01020102010001FFFFFF02010201| 65402|532064019659176|81440415268|

and the output expected is:

Code:

65398|532064019659172|81440415264|2,48,8147526905,0|2,314,81475269559,0|12,159,8147526905,1,1|14,235,81475269596,00|1,456345,81475269563,0|0,1,0,0,1,0,1,0,255,255,0,0,1,1 65399|532064024496121|81440415265 65400|532064019659174|81440415266 65401|532064019659175|81440415267|2,13,8147526905,0|12,14,8147526905,1,1|12,14,81475269565,0|1,2,1,2,1,0,1,255,255,255,2,1,2,1 65402|532064019659176|81440415268

Thanks again for all the help.

Re: Read binary file with line delimeter

To accomodate 9A and 9B is trivial

Code:

for (DWORD blk = 1; ff.getBlock(block, number, preliminar); blk++) { size_t ff79; if ((ff79 = block.find(SBLOCK)) != string::npos) { size_t five; if ((five = block.find("05", ff79)) != string::npos) { cx = block.substr(five + 2); sub = ""; bool got4 = false; for (size_t c = 0; c < cx.size() && !got4; c+= 2) if (cx[c] == '9' && ((cx[c + 1] >= '0' && cx[c + 1] <= '7') || cx[c + 1] == 'A' || cx[c + 1] == 'B')) { int slen = (convh[(cx[c + 2] - '0')] * 16 + convh[cx[c + 3] - '0']) * 2 + 4; if (cx[c + 1] != '5') sub += cx.substr(c, slen) + '|'; got4 = (cx[c + 1] == '4'); c += slen - 2; } if (got4) preliminar += sub; } } cout << number << preliminar << endl; }

I'll have a look at the decomposition over the next couple of days when I have time.

For what's currently output, what's the speed like for a large file?

Re: Read binary file with line delimeter

Code:

65398|532064019659172|81440415264|900F0102000000308147526905FFFFFF00|910F01020000013A81475269559FFFFF00|9310010C0000009F8147526905FFFFFF0101|960F010E000000EB81475269596FFFFF00|970F01010006F69981475269563FFFFF00|940E0001000001000100FFFF00000101| 65399|532064024496121|81440415265| 65400|532064019659174|81440415266| 65401|532064019659175|81440415267|910F01020000000D8147526905FFFFFF00|9310010C0000000D8147526905FFFFFF0101|960F010C0000000D81475269565FFFFF00|940E01020102010001FFFFFF02010201| 65402|532064019659176|81440415268|

and the output expected is:

Code:

65398|532064019659172|81440415264|2,48,8147526905,0|2,314,81475269559,0|12,159,8147526905,1,1|14,235,81475269596,00|1,456345,81475269563,0|0,1,0,0,1,0,1,0,255,255,0,0,1,1 65399|532064024496121|81440415265 65400|532064019659174|81440415266 65401|532064019659175|81440415267|2,13,8147526905,0|12,14,8147526905,1,1|12,14,81475269565,0|1,2,1,2,1,0,1,255,255,255,2,1,2,1 65402|532064019659176|81440415268

Shouldn't the expected output for 65401 be 13 rather than the 14 highlighted - as hex D is 13 decimal?

Re: Read binary file with line delimeter

Apart from the issue raised above in post #126, the program below produces the expected output as per your post #124. Have fun!

Code:

#include <iostream> #include <fstream> #include <string> #include <ctime> #include <cstdlib> using namespace std; typedef unsigned char BYTE; typedef unsigned short int WORD; typedef unsigned long int DWORD; #ifndef LOBYTE #define LOBYTE(w) ((BYTE)((WORD)(w) & 0xff)) #endif #ifndef HIBYTE #define HIBYTE(w) ((BYTE)((WORD)(w) >> 8)) #endif #define CONVDEC(num) (convh[cx[c + (num)] - '0'] * 16 + convh[cx[c + (num) + 1] - '0']) const char hconv[16] = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'}; const int convh[23] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 0, 0, 0, 10, 11, 12, 13, 14, 15}; const WORD SEPAR = 0xFF77; const char SBLOCK[] = "FF79"; class FileFields { private: ifstream ifs; bool opened; public: FileFields() : opened(false) {} ~FileFields() { if (opened) ifs.close(); } bool open(const char* name); bool getBlock(string& field, DWORD& number, string& firstpart, WORD delim = SEPAR); bool getField(string& field, WORD delim = SEPAR); }; bool FileFields::open(const char* name) { ifs.open(name, ios::binary); return (opened = ifs.is_open()); } bool FileFields::getBlock(string& field, DWORD& number, string& firstpart, WORD delim) { BYTE num[3], first[16], by, ub, lb; number = 0; firstpart = "|"; if (!opened || !ifs.good()) return false; ifs.read((char*)num, 3); number = (num[0] << 16) + (num[1] << 8) + num[2]; if (!ifs.good()) return false; ifs.read((char*)first, 16); for (int p = 1; p <= 2; p++) { const int last = p * 8; for (int i = (p - 1) * 8; i < last; i++) if ((ub = ((by = first[i]) >> 4)) < 0xf) { firstpart += hconv[ub]; if ((lb = (by & 0x0f)) < 0xf) firstpart += hconv[lb]; else break; } else break; firstpart += '|'; } return getField(field); } bool FileFields::getField(string& field, WORD delim) { char by; bool cont = true; field = ""; if (!opened || !ifs.good()) return false; for (ifs.get(by); cont && ifs.gcount(); ifs.get(by)) { if ((BYTE)by == HIBYTE(delim)) if ((BYTE)ifs.peek() == LOBYTE(delim)) cont = false; if (cont) { field += hconv[(BYTE)by >> 4]; field += hconv[(BYTE)by & 0xf]; } } return true; } int main() { FileFields ff; //if (!ff.open("d:\\philidor\\bin2g")) { if (!ff.open("d:\\philidor\\binsmall")) { cout << "Cannot open file!" << endl; return 1; } string header; ff.getField(header); string block; block.reserve(7000); string preliminar; preliminar.reserve(7000); string cx; cx.reserve(7000); string sub; sub.reserve(7000); DWORD number; char num[10]; time_t timest = time(NULL); for (DWORD blk = 1; ff.getBlock(block, number, preliminar); blk++) { size_t ff79; if ((ff79 = block.find(SBLOCK)) != string::npos) { size_t five; if ((five = block.find("05", ff79)) != string::npos) { cx = block.substr(five + 2); sub = ""; bool got4 = false; for (size_t c = 0; c < cx.size() && !got4; c+= 2) if (cx[c] == '9' && ((cx[c + 1] >= '0' && cx[c + 1] <= '7') || cx[c + 1] == 'A' || cx[c + 1] == 'B')) { const int slen = CONVDEC(2) * 2; if (got4 = (cx[c + 1] == '4')) for (int i = 4; i < slen + 4; i += 2) { sub += _itoa(CONVDEC(i), num, 10); if (i != slen + 2) sub += ','; } else if (cx[c + 1] != '5') { sub += _itoa(CONVDEC(6), num, 10); sub += ','; int dec = 0; for (int s = 8; s < 16; s += 2) dec = (dec << 8) + CONVDEC(s); sub += _itoa(dec, num, 10); sub += ','; for (size_t s = c + 16; s < c + 32; s++) if (cx[s] != 'F') sub += cx[s]; else break; sub += ','; sub += _itoa(CONVDEC(32), num, 10); if (slen == 32) { sub += ','; sub += _itoa(CONVDEC(34), num, 10); } sub += '|'; } c += slen + 2; } if (got4) preliminar += sub; } } cout << number << preliminar << endl; } cout << "Time taken: " << time(NULL) - timest << endl; return 0; }

Re: Read binary file with line delimeter

Hello 2kaud,

Thanks! I've tried and it seems to work just fine, but I'll continue trying because with one small file I got segmentation fault
and only prints the first line, I need to check that file.

For the previous code with a 2G file it was processed in 471 seconds (7.85 min)

The last output I'd like to get is a mapping for the substrings, I mean, when the substring begins with 90, print the values for substring
in column 4, if begin with 91 print its values in column 5 and so on. But if any substring doesn't exist within sub-block, then print empty
space.

The mapping I'd like is as below.

if begins with 90 print its values in 4th column
if begins with 91 print its values in 5th column
if begins with 9A print its values in 6th column
if begins with 92 print its values in 7th column
if begins with 93 print its values in 8th column
if begins with 9B print its values in 9th column
if begins with 96 print its values in 10th column
if begins with 97 print its values in 11th column
if begins with 94 print its values in 12th column

So, the current output with your last code is:

Code:

65398|532064019659172|81440415264|2,48,8147526905,0|2,314,81475269559,0|12,159,8147526905,1,1|14,235,81475269596,0|1,456345,81475269563,0|0,1,0,0,1,0,1,0,255,255,0,0,1,1 65399|532064024496121|81440415265| 65400|532064019659174|81440415266| 65401|532064019659175|81440415267|2,13,8147526905,0|12,13,8147526905,1,1|12,13,81475269565,0|1,2,1,2,1,0,1,255,255,255,2,1,2,1 65402|532064019659176|81440415268|

And desired output

Code:

65398|532064019659172|81440415264|2,48,8147526905,0|2,314,81475269559,0|||12,159,8147526905,1,1||14,235,81475269596,0|1,456345,81475269563,0|0,1,0,0,1,0,1,0,255,255,0,0,1,1 65399|532064024496121|81440415265||||||||| 65400|532064019659174|81440415266||||||||| 65401|532064019659175|81440415267||2,13,8147526905,0|||12,13,8147526905,1,1||12,13,81475269565,0||1,2,1,2,1,0,1,255,255,255,2,1,2,1 65402|532064019659176|81440415268|||||||||

Thanks for all the help.

Re: Read binary file with line delimeter

If present, do the substrings beginning with 9X always occur in the order 90, 91, 9A, 92, 93, 9B, 96, 97 and 94 - or can they occur in any order with 94 always being the last?

Re: Read binary file with line delimeter

When they appear (90, 91, 9A, 92, 93, 9B, 96, 97), can occur in any order, but always the substring 94X.... is at the end.

Re: Read binary file with line delimeter

Yes, I thought you were going to say that! That complicates matters. I'll have to think about this. I'll probably have to create a vector for the substrings with index based upon the 9X code - as I can't just concaternate the output together as I do now. Hmm.

Can you confirm that for any ff79 block, the sub-blocks starting 9x can only appear once but in ary order with 94 at the end - ie say 91 sub-block can only occur once and not multiple times?

Incidentially, once you have the output mapped as per post #128, what are you going to do with it?

Re: Read binary file with line delimeter

Quote:

Originally Posted by 2kaud

Incidentially, once you have the output mapped as per post #128, what are you going to do with it?

I undertand that could be more complicate print in that order. I don't know how to put in code or change your code to do that, but my idea is something have an array A[90]=4, A[91]=5, etc. And and array B with 9 empty values, then, when the first byte is 90 do B[A[x]-4]=B[A[90]-4]=B[0]="12,13,814264845,0" . Then this would fill element 0 of array B.

It's only an idea.

Regarding your question, since 90, 91, 9A, etc form part of a different category, I'd like to print in the same column the corresponding values and then would be easy to open in Excel for example.

Thanks again for the help.

Re: Read binary file with line delimeter

Fine, but you haven't answered my question

Can you confirm that for any individual ff79 block, the sub-blocks starting 9x can only appear once within that block but in any order with 94 at the end - ie say 91 sub-block can only occur once and not multiple times in any one block?[/QUOTE]

Unless you say differently, I'm going to assume that each 9X can only occur once in the same ff79 block.

Re: Read binary file with line delimeter

Sorry 2kaud.

Yes, each substring that begins with 9X only appears once, in any order and only once. and the substring 940EX... always appears at the end if at least there is one substring.

Re: Read binary file with line delimeter

Hello again,

I've tried your last code in CodeBlocks with GNU GCC and the compilation works, but I've tested in Visual Studio 2013 and I receive error in compilation with _itoa() saying "itoa() is not safe, you can use instead itoa_s()".

I changed in all cases from

Code:

_itoa(CONVDEC(6), num, 10)

to

Code:

_itoa_s(CONVDEC(6), num, sizeof(num) 10)

But is only some strings.