-
October 28th, 2013, 03:22 PM
#121
Re: Read binary file with line delimeter
So if there is no 940E sequence at the end, you ignore the sub blocks? Is that right?
All advice is offered in good faith only. All my code is tested (unless stated explicitly otherwise) with the latest version of Microsoft Visual Studio (using the supported features of the latest standard) and is offered as examples only - not as production quality. I cannot offer advice regarding any other c/c++ compiler/IDE or incompatibilities with VS. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/ and can be used without reference or acknowledgement. Also note that I only provide advice and guidance via the forums - and not via private messages!
C++23 Compiler: Microsoft VS2022 (17.6.5)
-
October 28th, 2013, 03:32 PM
#122
Re: Read binary file with line delimeter
Originally Posted by 2kaud
So if there is no 940E sequence at the end, you ignore the sub blocks? Is that right?
Yes 2kaud. If is not present the complete conditions, then the string doesn't qualify to be a sub-block.
It should contain 059X +.. +940E+14 bytes
-
October 28th, 2013, 03:34 PM
#123
Re: Read binary file with line delimeter
Try this
Code:
int main()
{
FileFields ff;
//if (!ff.open("d:\\philidor\\bin2g")) {
if (!ff.open("d:\\philidor\\binsmall")) {
cout << "Cannot open file!" << endl;
return 1;
}
string header;
ff.getField(header);
string block;
block.reserve(7000);
string preliminar;
preliminar.reserve(7000);
string cx;
cx.reserve(7000);
string sub;
sub.reserve(7000);
DWORD number;
time_t timest = time(NULL);
for (DWORD blk = 1; ff.getBlock(block, number, preliminar); blk++) {
size_t ff79;
if ((ff79 = block.find(SBLOCK)) != string::npos) {
size_t five;
if ((five = block.find("05", ff79)) != string::npos) {
cx = block.substr(five + 2);
sub = "";
bool got4 = false;
for (size_t c = 0; c < cx.size() && !got4; c+= 2)
if (cx[c] == '9' && (cx[c + 1] >= '0' && cx[c + 1] <= '7' /*&& cx[c + 1] != '5'*/)) {
int slen = (convh[(cx[c + 2] - '0')] * 16 + convh[cx[c + 3] - '0']) * 2 + 4;
if (cx[c + 1] != '5')
sub += cx.substr(c, slen) + '|';
got4 = (cx[c + 1] == '4');
c += slen - 2;
}
if (got4)
preliminar += sub;
}
}
cout << number << preliminar << endl;
}
cout << "Time taken: " << time(NULL) - timest << endl;
return 0;
}
All advice is offered in good faith only. All my code is tested (unless stated explicitly otherwise) with the latest version of Microsoft Visual Studio (using the supported features of the latest standard) and is offered as examples only - not as production quality. I cannot offer advice regarding any other c/c++ compiler/IDE or incompatibilities with VS. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/ and can be used without reference or acknowledgement. Also note that I only provide advice and guidance via the forums - and not via private messages!
C++23 Compiler: Microsoft VS2022 (17.6.5)
-
October 29th, 2013, 01:38 AM
#124
Re: Read binary file with line delimeter
Hello 2kaud,
I've tested your last code and it works, it extracts all substrings expected.
I found that each substring that begins with 9X... could take 2 more values.
9X, where X=0,1,2,3,6,7,A,B
So, could begin with 9A and 9B too. I think this could be a problem since are not decimal numbers.
Now, I think is the more difficult part. I hope explain well
Each substring is composed like this:
When second byte is 0F (15 bytes), then is like this:
1 byte + 1 byte + 1 byte + 1 byte + 4 bytes + 8 bytes + 1 byte
90-0F-01-02-00000030-8147526905FFFFFF-00
When second byte is 10 (16 bytes), then is like this:
1 byte + 1 byte + 1 byte + 1 byte + 4 bytes + 8 bytes + 1 byte + 1 byte
93-10-01-0C-0000000D-8147526905FFFFFF-01-01
And I'd like to print from byte 4 to the end, converting each group of bytes to decimal, except the group of 8 bytes that should be printed without convertion but without "f's". Then for the 2 examples substring the print would be.
For first sample substring:
Code:
900F0102000000308147526905FFFFFF00 --> original substring
90-0F-01-02-00000030-8147526905FFFFFF-00 --> separating in groups
02-00000030-8147526905FFFFFF-00 --> These are the groups I want to print
2,48,8147526905,0 --> separated in commas, but in decimal except the section of 8 bytes that only is needed to remove the "f´s".
For 2nd sample substring:
Code:
9310010C0000000D8147526905FFFFFF0101 --> original substring
93-10-01-0C-0000000D-8147526905FFFFFF-01-01 --> separating in groups
0C-0000000D-8147526905FFFFFF-01-01--> These are the groups I want to print
12,13,8147526905,1,1 --> separated in commas, but in decimal except the section of 8 bytes that only is needed to remove the "f´s".
And when substring is the last substring, the one that begins with 940E + 14 bytes, I want to print each individual byte of those 14 bytes, in decimal
Code:
940E0001000001000100FFFF00000101 --> original
940E-00-01-00-00-01-00-01-00-FF-FF-00-00-01-01 --> Composed by 14 bytes
00-01-00-00-01-00-01-00-FF-FF-00-00-01-01 --> These 14 bytes I want to print
0,1,0,0,1,0,1,0,255,255,0,0,1,1 --> separated in commas, but in decimal
Then, currently the output with your last code using the binSmall file is:
Code:
65398|532064019659172|81440415264|900F0102000000308147526905FFFFFF00|910F01020000013A81475269559FFFFF00|9310010C0000009F8147526905FFFFFF0101|960F010E000000EB81475269596FFFFF00|970F01010006F69981475269563FFFFF00|940E0001000001000100FFFF00000101|
65399|532064024496121|81440415265|
65400|532064019659174|81440415266|
65401|532064019659175|81440415267|910F01020000000D8147526905FFFFFF00|9310010C0000000D8147526905FFFFFF0101|960F010C0000000D81475269565FFFFF00|940E01020102010001FFFFFF02010201|
65402|532064019659176|81440415268|
and the output expected is:
Code:
65398|532064019659172|81440415264|2,48,8147526905,0|2,314,81475269559,0|12,159,8147526905,1,1|14,235,81475269596,00|1,456345,81475269563,0|0,1,0,0,1,0,1,0,255,255,0,0,1,1
65399|532064024496121|81440415265
65400|532064019659174|81440415266
65401|532064019659175|81440415267|2,13,8147526905,0|12,14,8147526905,1,1|12,14,81475269565,0|1,2,1,2,1,0,1,255,255,255,2,1,2,1
65402|532064019659176|81440415268
Thanks again for all the help.
-
October 29th, 2013, 04:38 AM
#125
Re: Read binary file with line delimeter
To accomodate 9A and 9B is trivial
Code:
for (DWORD blk = 1; ff.getBlock(block, number, preliminar); blk++) {
size_t ff79;
if ((ff79 = block.find(SBLOCK)) != string::npos) {
size_t five;
if ((five = block.find("05", ff79)) != string::npos) {
cx = block.substr(five + 2);
sub = "";
bool got4 = false;
for (size_t c = 0; c < cx.size() && !got4; c+= 2)
if (cx[c] == '9' && ((cx[c + 1] >= '0' && cx[c + 1] <= '7') || cx[c + 1] == 'A' || cx[c + 1] == 'B')) {
int slen = (convh[(cx[c + 2] - '0')] * 16 + convh[cx[c + 3] - '0']) * 2 + 4;
if (cx[c + 1] != '5')
sub += cx.substr(c, slen) + '|';
got4 = (cx[c + 1] == '4');
c += slen - 2;
}
if (got4)
preliminar += sub;
}
}
cout << number << preliminar << endl;
}
I'll have a look at the decomposition over the next couple of days when I have time.
For what's currently output, what's the speed like for a large file?
All advice is offered in good faith only. All my code is tested (unless stated explicitly otherwise) with the latest version of Microsoft Visual Studio (using the supported features of the latest standard) and is offered as examples only - not as production quality. I cannot offer advice regarding any other c/c++ compiler/IDE or incompatibilities with VS. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/ and can be used without reference or acknowledgement. Also note that I only provide advice and guidance via the forums - and not via private messages!
C++23 Compiler: Microsoft VS2022 (17.6.5)
-
October 29th, 2013, 06:57 AM
#126
Re: Read binary file with line delimeter
Code:
65398|532064019659172|81440415264|900F0102000000308147526905FFFFFF00|910F01020000013A81475269559FFFFF00|9310010C0000009F8147526905FFFFFF0101|960F010E000000EB81475269596FFFFF00|970F01010006F69981475269563FFFFF00|940E0001000001000100FFFF00000101|
65399|532064024496121|81440415265|
65400|532064019659174|81440415266|
65401|532064019659175|81440415267|910F01020000000D8147526905FFFFFF00|9310010C0000000D8147526905FFFFFF0101|960F010C0000000D81475269565FFFFF00|940E01020102010001FFFFFF02010201|
65402|532064019659176|81440415268|
and the output expected is:
Code:
65398|532064019659172|81440415264|2,48,8147526905,0|2,314,81475269559,0|12,159,8147526905,1,1|14,235,81475269596,00|1,456345,81475269563,0|0,1,0,0,1,0,1,0,255,255,0,0,1,1
65399|532064024496121|81440415265
65400|532064019659174|81440415266
65401|532064019659175|81440415267|2,13,8147526905,0|12,14,8147526905,1,1|12,14,81475269565,0|1,2,1,2,1,0,1,255,255,255,2,1,2,1
65402|532064019659176|81440415268
Shouldn't the expected output for 65401 be 13 rather than the 14 highlighted - as hex D is 13 decimal?
All advice is offered in good faith only. All my code is tested (unless stated explicitly otherwise) with the latest version of Microsoft Visual Studio (using the supported features of the latest standard) and is offered as examples only - not as production quality. I cannot offer advice regarding any other c/c++ compiler/IDE or incompatibilities with VS. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/ and can be used without reference or acknowledgement. Also note that I only provide advice and guidance via the forums - and not via private messages!
C++23 Compiler: Microsoft VS2022 (17.6.5)
-
October 29th, 2013, 08:33 AM
#127
Re: Read binary file with line delimeter
Apart from the issue raised above in post #126, the program below produces the expected output as per your post #124. Have fun!
Code:
#include <iostream>
#include <fstream>
#include <string>
#include <ctime>
#include <cstdlib>
using namespace std;
typedef unsigned char BYTE;
typedef unsigned short int WORD;
typedef unsigned long int DWORD;
#ifndef LOBYTE
#define LOBYTE(w) ((BYTE)((WORD)(w) & 0xff))
#endif
#ifndef HIBYTE
#define HIBYTE(w) ((BYTE)((WORD)(w) >> 8))
#endif
#define CONVDEC(num) (convh[cx[c + (num)] - '0'] * 16 + convh[cx[c + (num) + 1] - '0'])
const char hconv[16] = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
const int convh[23] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 0, 0, 0, 10, 11, 12, 13, 14, 15};
const WORD SEPAR = 0xFF77;
const char SBLOCK[] = "FF79";
class FileFields
{
private:
ifstream ifs;
bool opened;
public:
FileFields() : opened(false) {}
~FileFields() {
if (opened)
ifs.close();
}
bool open(const char* name);
bool getBlock(string& field, DWORD& number, string& firstpart, WORD delim = SEPAR);
bool getField(string& field, WORD delim = SEPAR);
};
bool FileFields::open(const char* name) {
ifs.open(name, ios::binary);
return (opened = ifs.is_open());
}
bool FileFields::getBlock(string& field, DWORD& number, string& firstpart, WORD delim)
{
BYTE num[3],
first[16],
by,
ub,
lb;
number = 0;
firstpart = "|";
if (!opened || !ifs.good())
return false;
ifs.read((char*)num, 3);
number = (num[0] << 16) + (num[1] << 8) + num[2];
if (!ifs.good())
return false;
ifs.read((char*)first, 16);
for (int p = 1; p <= 2; p++) {
const int last = p * 8;
for (int i = (p - 1) * 8; i < last; i++)
if ((ub = ((by = first[i]) >> 4)) < 0xf) {
firstpart += hconv[ub];
if ((lb = (by & 0x0f)) < 0xf)
firstpart += hconv[lb];
else
break;
} else
break;
firstpart += '|';
}
return getField(field);
}
bool FileFields::getField(string& field, WORD delim)
{
char by;
bool cont = true;
field = "";
if (!opened || !ifs.good())
return false;
for (ifs.get(by); cont && ifs.gcount(); ifs.get(by)) {
if ((BYTE)by == HIBYTE(delim))
if ((BYTE)ifs.peek() == LOBYTE(delim))
cont = false;
if (cont) {
field += hconv[(BYTE)by >> 4];
field += hconv[(BYTE)by & 0xf];
}
}
return true;
}
int main()
{
FileFields ff;
//if (!ff.open("d:\\philidor\\bin2g")) {
if (!ff.open("d:\\philidor\\binsmall")) {
cout << "Cannot open file!" << endl;
return 1;
}
string header;
ff.getField(header);
string block;
block.reserve(7000);
string preliminar;
preliminar.reserve(7000);
string cx;
cx.reserve(7000);
string sub;
sub.reserve(7000);
DWORD number;
char num[10];
time_t timest = time(NULL);
for (DWORD blk = 1; ff.getBlock(block, number, preliminar); blk++) {
size_t ff79;
if ((ff79 = block.find(SBLOCK)) != string::npos) {
size_t five;
if ((five = block.find("05", ff79)) != string::npos) {
cx = block.substr(five + 2);
sub = "";
bool got4 = false;
for (size_t c = 0; c < cx.size() && !got4; c+= 2)
if (cx[c] == '9' && ((cx[c + 1] >= '0' && cx[c + 1] <= '7') || cx[c + 1] == 'A' || cx[c + 1] == 'B')) {
const int slen = CONVDEC(2) * 2;
if (got4 = (cx[c + 1] == '4'))
for (int i = 4; i < slen + 4; i += 2) {
sub += _itoa(CONVDEC(i), num, 10);
if (i != slen + 2)
sub += ',';
}
else
if (cx[c + 1] != '5') {
sub += _itoa(CONVDEC(6), num, 10);
sub += ',';
int dec = 0;
for (int s = 8; s < 16; s += 2)
dec = (dec << 8) + CONVDEC(s);
sub += _itoa(dec, num, 10);
sub += ',';
for (size_t s = c + 16; s < c + 32; s++)
if (cx[s] != 'F')
sub += cx[s];
else
break;
sub += ',';
sub += _itoa(CONVDEC(32), num, 10);
if (slen == 32) {
sub += ',';
sub += _itoa(CONVDEC(34), num, 10);
}
sub += '|';
}
c += slen + 2;
}
if (got4)
preliminar += sub;
}
}
cout << number << preliminar << endl;
}
cout << "Time taken: " << time(NULL) - timest << endl;
return 0;
}
All advice is offered in good faith only. All my code is tested (unless stated explicitly otherwise) with the latest version of Microsoft Visual Studio (using the supported features of the latest standard) and is offered as examples only - not as production quality. I cannot offer advice regarding any other c/c++ compiler/IDE or incompatibilities with VS. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/ and can be used without reference or acknowledgement. Also note that I only provide advice and guidance via the forums - and not via private messages!
C++23 Compiler: Microsoft VS2022 (17.6.5)
-
October 29th, 2013, 02:08 PM
#128
Re: Read binary file with line delimeter
Hello 2kaud,
Thanks! I've tried and it seems to work just fine, but I'll continue trying because with one small file I got segmentation fault
and only prints the first line, I need to check that file.
For the previous code with a 2G file it was processed in 471 seconds (7.85 min)
The last output I'd like to get is a mapping for the substrings, I mean, when the substring begins with 90, print the values for substring
in column 4, if begin with 91 print its values in column 5 and so on. But if any substring doesn't exist within sub-block, then print empty
space.
The mapping I'd like is as below.
if begins with 90 print its values in 4th column
if begins with 91 print its values in 5th column
if begins with 9A print its values in 6th column
if begins with 92 print its values in 7th column
if begins with 93 print its values in 8th column
if begins with 9B print its values in 9th column
if begins with 96 print its values in 10th column
if begins with 97 print its values in 11th column
if begins with 94 print its values in 12th column
So, the current output with your last code is:
Code:
65398|532064019659172|81440415264|2,48,8147526905,0|2,314,81475269559,0|12,159,8147526905,1,1|14,235,81475269596,0|1,456345,81475269563,0|0,1,0,0,1,0,1,0,255,255,0,0,1,1
65399|532064024496121|81440415265|
65400|532064019659174|81440415266|
65401|532064019659175|81440415267|2,13,8147526905,0|12,13,8147526905,1,1|12,13,81475269565,0|1,2,1,2,1,0,1,255,255,255,2,1,2,1
65402|532064019659176|81440415268|
And desired output
Code:
65398|532064019659172|81440415264|2,48,8147526905,0|2,314,81475269559,0|||12,159,8147526905,1,1||14,235,81475269596,0|1,456345,81475269563,0|0,1,0,0,1,0,1,0,255,255,0,0,1,1
65399|532064024496121|81440415265|||||||||
65400|532064019659174|81440415266|||||||||
65401|532064019659175|81440415267||2,13,8147526905,0|||12,13,8147526905,1,1||12,13,81475269565,0||1,2,1,2,1,0,1,255,255,255,2,1,2,1
65402|532064019659176|81440415268|||||||||
Thanks for all the help.
-
October 29th, 2013, 02:23 PM
#129
Re: Read binary file with line delimeter
If present, do the substrings beginning with 9X always occur in the order 90, 91, 9A, 92, 93, 9B, 96, 97 and 94 - or can they occur in any order with 94 always being the last?
All advice is offered in good faith only. All my code is tested (unless stated explicitly otherwise) with the latest version of Microsoft Visual Studio (using the supported features of the latest standard) and is offered as examples only - not as production quality. I cannot offer advice regarding any other c/c++ compiler/IDE or incompatibilities with VS. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/ and can be used without reference or acknowledgement. Also note that I only provide advice and guidance via the forums - and not via private messages!
C++23 Compiler: Microsoft VS2022 (17.6.5)
-
October 29th, 2013, 02:41 PM
#130
Re: Read binary file with line delimeter
When they appear (90, 91, 9A, 92, 93, 9B, 96, 97), can occur in any order, but always the substring 94X.... is at the end.
-
October 29th, 2013, 03:35 PM
#131
Re: Read binary file with line delimeter
Yes, I thought you were going to say that! That complicates matters. I'll have to think about this. I'll probably have to create a vector for the substrings with index based upon the 9X code - as I can't just concaternate the output together as I do now. Hmm.
Can you confirm that for any ff79 block, the sub-blocks starting 9x can only appear once but in ary order with 94 at the end - ie say 91 sub-block can only occur once and not multiple times?
Incidentially, once you have the output mapped as per post #128, what are you going to do with it?
Last edited by 2kaud; October 29th, 2013 at 04:12 PM.
All advice is offered in good faith only. All my code is tested (unless stated explicitly otherwise) with the latest version of Microsoft Visual Studio (using the supported features of the latest standard) and is offered as examples only - not as production quality. I cannot offer advice regarding any other c/c++ compiler/IDE or incompatibilities with VS. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/ and can be used without reference or acknowledgement. Also note that I only provide advice and guidance via the forums - and not via private messages!
C++23 Compiler: Microsoft VS2022 (17.6.5)
-
October 29th, 2013, 04:31 PM
#132
Re: Read binary file with line delimeter
Originally Posted by 2kaud
Incidentially, once you have the output mapped as per post #128, what are you going to do with it?
I undertand that could be more complicate print in that order. I don't know how to put in code or change your code to do that, but my idea is something have an array A[90]=4, A[91]=5, etc. And and array B with 9 empty values, then, when the first byte is 90 do B[A[x]-4]=B[A[90]-4]=B[0]="12,13,814264845,0" . Then this would fill element 0 of array B.
It's only an idea.
Regarding your question, since 90, 91, 9A, etc form part of a different category, I'd like to print in the same column the corresponding values and then would be easy to open in Excel for example.
Thanks again for the help.
Last edited by Philidor; October 29th, 2013 at 04:34 PM.
-
October 29th, 2013, 05:28 PM
#133
Re: Read binary file with line delimeter
Fine, but you haven't answered my question
Can you confirm that for any individual ff79 block, the sub-blocks starting 9x can only appear once within that block but in any order with 94 at the end - ie say 91 sub-block can only occur once and not multiple times in any one block?[/QUOTE]
Unless you say differently, I'm going to assume that each 9X can only occur once in the same ff79 block.
All advice is offered in good faith only. All my code is tested (unless stated explicitly otherwise) with the latest version of Microsoft Visual Studio (using the supported features of the latest standard) and is offered as examples only - not as production quality. I cannot offer advice regarding any other c/c++ compiler/IDE or incompatibilities with VS. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/ and can be used without reference or acknowledgement. Also note that I only provide advice and guidance via the forums - and not via private messages!
C++23 Compiler: Microsoft VS2022 (17.6.5)
-
October 29th, 2013, 05:41 PM
#134
Re: Read binary file with line delimeter
Sorry 2kaud.
Yes, each substring that begins with 9X only appears once, in any order and only once. and the substring 940EX... always appears at the end if at least there is one substring.
-
October 30th, 2013, 12:20 AM
#135
Re: Read binary file with line delimeter
Hello again,
I've tried your last code in CodeBlocks with GNU GCC and the compilation works, but I've tested in Visual Studio 2013 and I receive error in compilation with _itoa() saying "itoa() is not safe, you can use instead itoa_s()".
I changed in all cases from
Code:
_itoa(CONVDEC(6), num, 10)
to
Code:
_itoa_s(CONVDEC(6), num, sizeof(num) 10)
But is only some strings.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|