So if there is no 940E sequence at the end, you ignore the sub blocks? Is that right?
Printable View
So if there is no 940E sequence at the end, you ignore the sub blocks? Is that right?
Try this
Code:int main()
{
FileFields ff;
//if (!ff.open("d:\\philidor\\bin2g")) {
if (!ff.open("d:\\philidor\\binsmall")) {
cout << "Cannot open file!" << endl;
return 1;
}
string header;
ff.getField(header);
string block;
block.reserve(7000);
string preliminar;
preliminar.reserve(7000);
string cx;
cx.reserve(7000);
string sub;
sub.reserve(7000);
DWORD number;
time_t timest = time(NULL);
for (DWORD blk = 1; ff.getBlock(block, number, preliminar); blk++) {
size_t ff79;
if ((ff79 = block.find(SBLOCK)) != string::npos) {
size_t five;
if ((five = block.find("05", ff79)) != string::npos) {
cx = block.substr(five + 2);
sub = "";
bool got4 = false;
for (size_t c = 0; c < cx.size() && !got4; c+= 2)
if (cx[c] == '9' && (cx[c + 1] >= '0' && cx[c + 1] <= '7' /*&& cx[c + 1] != '5'*/)) {
int slen = (convh[(cx[c + 2] - '0')] * 16 + convh[cx[c + 3] - '0']) * 2 + 4;
if (cx[c + 1] != '5')
sub += cx.substr(c, slen) + '|';
got4 = (cx[c + 1] == '4');
c += slen - 2;
}
if (got4)
preliminar += sub;
}
}
cout << number << preliminar << endl;
}
cout << "Time taken: " << time(NULL) - timest << endl;
return 0;
}
Hello 2kaud,
I've tested your last code and it works, it extracts all substrings expected.
I found that each substring that begins with 9X... could take 2 more values.
9X, where X=0,1,2,3,6,7,A,B
So, could begin with 9A and 9B too. I think this could be a problem since are not decimal numbers.
Now, I think is the more difficult part. I hope explain well
Each substring is composed like this:
When second byte is 0F (15 bytes), then is like this:
1 byte + 1 byte + 1 byte + 1 byte + 4 bytes + 8 bytes + 1 byte
90-0F-01-02-00000030-8147526905FFFFFF-00
When second byte is 10 (16 bytes), then is like this:
1 byte + 1 byte + 1 byte + 1 byte + 4 bytes + 8 bytes + 1 byte + 1 byte
93-10-01-0C-0000000D-8147526905FFFFFF-01-01
And I'd like to print from byte 4 to the end, converting each group of bytes to decimal, except the group of 8 bytes that should be printed without convertion but without "f's". Then for the 2 examples substring the print would be.
For first sample substring:
For 2nd sample substring:Code:900F0102000000308147526905FFFFFF00 --> original substring
90-0F-01-02-00000030-8147526905FFFFFF-00 --> separating in groups
02-00000030-8147526905FFFFFF-00 --> These are the groups I want to print
2,48,8147526905,0 --> separated in commas, but in decimal except the section of 8 bytes that only is needed to remove the "f´s".
And when substring is the last substring, the one that begins with 940E + 14 bytes, I want to print each individual byte of those 14 bytes, in decimalCode:9310010C0000000D8147526905FFFFFF0101 --> original substring
93-10-01-0C-0000000D-8147526905FFFFFF-01-01 --> separating in groups
0C-0000000D-8147526905FFFFFF-01-01--> These are the groups I want to print
12,13,8147526905,1,1 --> separated in commas, but in decimal except the section of 8 bytes that only is needed to remove the "f´s".
Then, currently the output with your last code using the binSmall file is:Code:940E0001000001000100FFFF00000101 --> original
940E-00-01-00-00-01-00-01-00-FF-FF-00-00-01-01 --> Composed by 14 bytes
00-01-00-00-01-00-01-00-FF-FF-00-00-01-01 --> These 14 bytes I want to print
0,1,0,0,1,0,1,0,255,255,0,0,1,1 --> separated in commas, but in decimal
and the output expected is:Code:65398|532064019659172|81440415264|900F0102000000308147526905FFFFFF00|910F01020000013A81475269559FFFFF00|9310010C0000009F8147526905FFFFFF0101|960F010E000000EB81475269596FFFFF00|970F01010006F69981475269563FFFFF00|940E0001000001000100FFFF00000101|
65399|532064024496121|81440415265|
65400|532064019659174|81440415266|
65401|532064019659175|81440415267|910F01020000000D8147526905FFFFFF00|9310010C0000000D8147526905FFFFFF0101|960F010C0000000D81475269565FFFFF00|940E01020102010001FFFFFF02010201|
65402|532064019659176|81440415268|
Thanks again for all the help.Code:65398|532064019659172|81440415264|2,48,8147526905,0|2,314,81475269559,0|12,159,8147526905,1,1|14,235,81475269596,00|1,456345,81475269563,0|0,1,0,0,1,0,1,0,255,255,0,0,1,1
65399|532064024496121|81440415265
65400|532064019659174|81440415266
65401|532064019659175|81440415267|2,13,8147526905,0|12,14,8147526905,1,1|12,14,81475269565,0|1,2,1,2,1,0,1,255,255,255,2,1,2,1
65402|532064019659176|81440415268
To accomodate 9A and 9B is trivial
I'll have a look at the decomposition over the next couple of days when I have time.Code:for (DWORD blk = 1; ff.getBlock(block, number, preliminar); blk++) {
size_t ff79;
if ((ff79 = block.find(SBLOCK)) != string::npos) {
size_t five;
if ((five = block.find("05", ff79)) != string::npos) {
cx = block.substr(five + 2);
sub = "";
bool got4 = false;
for (size_t c = 0; c < cx.size() && !got4; c+= 2)
if (cx[c] == '9' && ((cx[c + 1] >= '0' && cx[c + 1] <= '7') || cx[c + 1] == 'A' || cx[c + 1] == 'B')) {
int slen = (convh[(cx[c + 2] - '0')] * 16 + convh[cx[c + 3] - '0']) * 2 + 4;
if (cx[c + 1] != '5')
sub += cx.substr(c, slen) + '|';
got4 = (cx[c + 1] == '4');
c += slen - 2;
}
if (got4)
preliminar += sub;
}
}
cout << number << preliminar << endl;
}
For what's currently output, what's the speed like for a large file?
and the output expected is:Code:
65398|532064019659172|81440415264|900F0102000000308147526905FFFFFF00|910F01020000013A81475269559FFFFF00|9310010C0000009F8147526905FFFFFF0101|960F010E000000EB81475269596FFFFF00|970F01010006F69981475269563FFFFF00|940E0001000001000100FFFF00000101|
65399|532064024496121|81440415265|
65400|532064019659174|81440415266|
65401|532064019659175|81440415267|910F01020000000D8147526905FFFFFF00|9310010C0000000D8147526905FFFFFF0101|960F010C0000000D81475269565FFFFF00|940E01020102010001FFFFFF02010201|
65402|532064019659176|81440415268|
Shouldn't the expected output for 65401 be 13 rather than the 14 highlighted - as hex D is 13 decimal?Code:
65398|532064019659172|81440415264|2,48,8147526905,0|2,314,81475269559,0|12,159,8147526905,1,1|14,235,81475269596,00|1,456345,81475269563,0|0,1,0,0,1,0,1,0,255,255,0,0,1,1
65399|532064024496121|81440415265
65400|532064019659174|81440415266
65401|532064019659175|81440415267|2,13,8147526905,0|12,14,8147526905,1,1|12,14,81475269565,0|1,2,1,2,1,0,1,255,255,255,2,1,2,1
65402|532064019659176|81440415268
Apart from the issue raised above in post #126, the program below produces the expected output as per your post #124. Have fun!
Code:#include <iostream>
#include <fstream>
#include <string>
#include <ctime>
#include <cstdlib>
using namespace std;
typedef unsigned char BYTE;
typedef unsigned short int WORD;
typedef unsigned long int DWORD;
#ifndef LOBYTE
#define LOBYTE(w) ((BYTE)((WORD)(w) & 0xff))
#endif
#ifndef HIBYTE
#define HIBYTE(w) ((BYTE)((WORD)(w) >> 8))
#endif
#define CONVDEC(num) (convh[cx[c + (num)] - '0'] * 16 + convh[cx[c + (num) + 1] - '0'])
const char hconv[16] = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
const int convh[23] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 0, 0, 0, 10, 11, 12, 13, 14, 15};
const WORD SEPAR = 0xFF77;
const char SBLOCK[] = "FF79";
class FileFields
{
private:
ifstream ifs;
bool opened;
public:
FileFields() : opened(false) {}
~FileFields() {
if (opened)
ifs.close();
}
bool open(const char* name);
bool getBlock(string& field, DWORD& number, string& firstpart, WORD delim = SEPAR);
bool getField(string& field, WORD delim = SEPAR);
};
bool FileFields::open(const char* name) {
ifs.open(name, ios::binary);
return (opened = ifs.is_open());
}
bool FileFields::getBlock(string& field, DWORD& number, string& firstpart, WORD delim)
{
BYTE num[3],
first[16],
by,
ub,
lb;
number = 0;
firstpart = "|";
if (!opened || !ifs.good())
return false;
ifs.read((char*)num, 3);
number = (num[0] << 16) + (num[1] << 8) + num[2];
if (!ifs.good())
return false;
ifs.read((char*)first, 16);
for (int p = 1; p <= 2; p++) {
const int last = p * 8;
for (int i = (p - 1) * 8; i < last; i++)
if ((ub = ((by = first[i]) >> 4)) < 0xf) {
firstpart += hconv[ub];
if ((lb = (by & 0x0f)) < 0xf)
firstpart += hconv[lb];
else
break;
} else
break;
firstpart += '|';
}
return getField(field);
}
bool FileFields::getField(string& field, WORD delim)
{
char by;
bool cont = true;
field = "";
if (!opened || !ifs.good())
return false;
for (ifs.get(by); cont && ifs.gcount(); ifs.get(by)) {
if ((BYTE)by == HIBYTE(delim))
if ((BYTE)ifs.peek() == LOBYTE(delim))
cont = false;
if (cont) {
field += hconv[(BYTE)by >> 4];
field += hconv[(BYTE)by & 0xf];
}
}
return true;
}
int main()
{
FileFields ff;
//if (!ff.open("d:\\philidor\\bin2g")) {
if (!ff.open("d:\\philidor\\binsmall")) {
cout << "Cannot open file!" << endl;
return 1;
}
string header;
ff.getField(header);
string block;
block.reserve(7000);
string preliminar;
preliminar.reserve(7000);
string cx;
cx.reserve(7000);
string sub;
sub.reserve(7000);
DWORD number;
char num[10];
time_t timest = time(NULL);
for (DWORD blk = 1; ff.getBlock(block, number, preliminar); blk++) {
size_t ff79;
if ((ff79 = block.find(SBLOCK)) != string::npos) {
size_t five;
if ((five = block.find("05", ff79)) != string::npos) {
cx = block.substr(five + 2);
sub = "";
bool got4 = false;
for (size_t c = 0; c < cx.size() && !got4; c+= 2)
if (cx[c] == '9' && ((cx[c + 1] >= '0' && cx[c + 1] <= '7') || cx[c + 1] == 'A' || cx[c + 1] == 'B')) {
const int slen = CONVDEC(2) * 2;
if (got4 = (cx[c + 1] == '4'))
for (int i = 4; i < slen + 4; i += 2) {
sub += _itoa(CONVDEC(i), num, 10);
if (i != slen + 2)
sub += ',';
}
else
if (cx[c + 1] != '5') {
sub += _itoa(CONVDEC(6), num, 10);
sub += ',';
int dec = 0;
for (int s = 8; s < 16; s += 2)
dec = (dec << 8) + CONVDEC(s);
sub += _itoa(dec, num, 10);
sub += ',';
for (size_t s = c + 16; s < c + 32; s++)
if (cx[s] != 'F')
sub += cx[s];
else
break;
sub += ',';
sub += _itoa(CONVDEC(32), num, 10);
if (slen == 32) {
sub += ',';
sub += _itoa(CONVDEC(34), num, 10);
}
sub += '|';
}
c += slen + 2;
}
if (got4)
preliminar += sub;
}
}
cout << number << preliminar << endl;
}
cout << "Time taken: " << time(NULL) - timest << endl;
return 0;
}
Hello 2kaud,
Thanks! I've tried and it seems to work just fine, but I'll continue trying because with one small file I got segmentation fault
and only prints the first line, I need to check that file.
For the previous code with a 2G file it was processed in 471 seconds (7.85 min)
The last output I'd like to get is a mapping for the substrings, I mean, when the substring begins with 90, print the values for substring
in column 4, if begin with 91 print its values in column 5 and so on. But if any substring doesn't exist within sub-block, then print empty
space.
The mapping I'd like is as below.
if begins with 90 print its values in 4th column
if begins with 91 print its values in 5th column
if begins with 9A print its values in 6th column
if begins with 92 print its values in 7th column
if begins with 93 print its values in 8th column
if begins with 9B print its values in 9th column
if begins with 96 print its values in 10th column
if begins with 97 print its values in 11th column
if begins with 94 print its values in 12th column
So, the current output with your last code is:
And desired outputCode:65398|532064019659172|81440415264|2,48,8147526905,0|2,314,81475269559,0|12,159,8147526905,1,1|14,235,81475269596,0|1,456345,81475269563,0|0,1,0,0,1,0,1,0,255,255,0,0,1,1
65399|532064024496121|81440415265|
65400|532064019659174|81440415266|
65401|532064019659175|81440415267|2,13,8147526905,0|12,13,8147526905,1,1|12,13,81475269565,0|1,2,1,2,1,0,1,255,255,255,2,1,2,1
65402|532064019659176|81440415268|
Thanks for all the help.Code:65398|532064019659172|81440415264|2,48,8147526905,0|2,314,81475269559,0|||12,159,8147526905,1,1||14,235,81475269596,0|1,456345,81475269563,0|0,1,0,0,1,0,1,0,255,255,0,0,1,1
65399|532064024496121|81440415265|||||||||
65400|532064019659174|81440415266|||||||||
65401|532064019659175|81440415267||2,13,8147526905,0|||12,13,8147526905,1,1||12,13,81475269565,0||1,2,1,2,1,0,1,255,255,255,2,1,2,1
65402|532064019659176|81440415268|||||||||
If present, do the substrings beginning with 9X always occur in the order 90, 91, 9A, 92, 93, 9B, 96, 97 and 94 - or can they occur in any order with 94 always being the last?
When they appear (90, 91, 9A, 92, 93, 9B, 96, 97), can occur in any order, but always the substring 94X.... is at the end.
Yes, I thought you were going to say that! That complicates matters. I'll have to think about this. I'll probably have to create a vector for the substrings with index based upon the 9X code - as I can't just concaternate the output together as I do now. Hmm.
Can you confirm that for any ff79 block, the sub-blocks starting 9x can only appear once but in ary order with 94 at the end - ie say 91 sub-block can only occur once and not multiple times?
Incidentially, once you have the output mapped as per post #128, what are you going to do with it?
I undertand that could be more complicate print in that order. I don't know how to put in code or change your code to do that, but my idea is something have an array A[90]=4, A[91]=5, etc. And and array B with 9 empty values, then, when the first byte is 90 do B[A[x]-4]=B[A[90]-4]=B[0]="12,13,814264845,0" . Then this would fill element 0 of array B.
It's only an idea.
Regarding your question, since 90, 91, 9A, etc form part of a different category, I'd like to print in the same column the corresponding values and then would be easy to open in Excel for example.
Thanks again for the help.
Fine, but you haven't answered my question
Can you confirm that for any individual ff79 block, the sub-blocks starting 9x can only appear once within that block but in any order with 94 at the end - ie say 91 sub-block can only occur once and not multiple times in any one block?[/QUOTE]
Unless you say differently, I'm going to assume that each 9X can only occur once in the same ff79 block.
Sorry 2kaud.
Yes, each substring that begins with 9X only appears once, in any order and only once. and the substring 940EX... always appears at the end if at least there is one substring.
Hello again,
I've tried your last code in CodeBlocks with GNU GCC and the compilation works, but I've tested in Visual Studio 2013 and I receive error in compilation with _itoa() saying "itoa() is not safe, you can use instead itoa_s()".
I changed in all cases from
toCode:_itoa(CONVDEC(6), num, 10)
But is only some strings.Code:_itoa_s(CONVDEC(6), num, sizeof(num) 10)