Re: Read binary file with line delimeter
Quote:
Originally Posted by
2kaud
For the first 2 strings, does the 'f' always appear at the end of the string, or can 'f' appear anywhere in the string? Is it just the 'f' at the end you don't want or 'f' that appears anywhere within this 16 bytes?
PS I think the 'f' always only appears at the end. Can you confirm.
PPS In the preliminar output does it start with a '|' and the various parts separated by '|'?
Hello 2kaud,
Yes, the "f's" are used to fill the substrings when the 16 bytes are not completely occupied by numbers (0 to 9), then the "f's" appear always from rigth to left of each substring and could appear one or more "f´s".
Instead of use "," as utput separator, the "|" is only the separator I'd like to use in the output.
I've tested the code and the output with small file I've uploaded in previous post and I get:
Code:
1,|532064450189340|81474549232|,0015000A48
2,|532064450189341|81474554768|,0015000A48
3,|532064450189342|81474557521|,0015000A48
4,|532064450189348|81477380427|,0015000A48
5,|532064450189349|81474663128|,0015000A48
where the first column is block number, 2nd column is first string, 3rd column is 2nd string, but is appearing a 4th column that is not supposed to appear. The 4th column should be the first substring of the sub-block if exist.
So, if only first 2 strings are processed currently, the output expected is:
Code:
1|532064450189340|81474549232
2|532064450189341|81474554768
3|532064450189342|81474557521
4|532064450189348|81477380427
5|532064450189349|81474663128
Thanks for the help 2kaud.
Re: Read binary file with line delimeter
For test purposes, within the for loop in main () I just output the first 10 chars of the string block as the 4th column for the first 20 blocks to check things were working properly. Modify the processing within the for loop as required.
Code:
for (DWORD blk = 1; ff.getBlock(block, number, firstpart); blk++) {
//if (blk < 20 )
cout << number << firstpart << endl;
//else
// break;
//if (number != blk)
//cout << "block " << blk << endl << number << "," << block.size() << "," << block.substr(0, 16) << endl;
}
I'll have a look at the rest over the next couple of days.
Re: Read binary file with line delimeter
Re sub strings. The only time I can find ff79 in the provided file is when it is part of the block number following a ff77??
Re: Read binary file with line delimeter
Quote:
Originally Posted by
2kaud
Re sub strings. The only time I can find ff79 in the provided file is when it is part of the block number following a ff77??
Hello 2kaud,
Certainly there a block number FF79, but this correlative shouln't be confused with the FF79 that says when begins a new sub-block within each block. So, once processed the first 2 strings that you already process in last code, if after that 2 strings is found a FF79 should belong to a sub-block and if FF77 is found before than a FF79 it means that there is no sub-block.
It could happen as below:
Code:
09373ff7700ff78532064450189342f81474557521fffff0015000a4800015a0
0014200016000013300013600013700015b00016600016500017700016900006
a00007900009300012200002100010900012600010800012b00002c00012d000
12e00015500015600072a00002f0000300000310000ff7900800932c90600000
000a000800935c90600000000000080093cc906000000008000800943c906000
00000800005910f01020000000d8147526905ffffff009310010c0000000d814
7526905ffffff0101960f010c0000000d81475269565fffff00940e010201020
10001ffffff02010201950600000000000005820037060100000100650000000
2000002001800000003000003001700000004000004000100000005000005001
50000000a00ffff006500000007802ec918009181475269555fffffff0091814
75269555fffff000103ca03001cfecb0a00000000000000000000cc0101811bc
90b009181475269557fffffffca06000000000000cb010bcc0101ff7700ff795
32064019857386f50440429469fffff0015000a4800015a00144200013300013
600013700017e00016900017900009300012200002100012600011000012b000
02c00002d00002e00005500005600072a00002f0000300000310000ff7900800
932c9060000a0000000800935c90600000000000080093cc9060000000080008
00943c90600000000800005910f01020000000d8147526905ffffff009310010
c0000000d8147526905ffffff0101960f010c0000000d81475269565fffff009
40e01020102010001ffffff02010201950600000000000005820037060100000
10065000000020000020
So, in the second block begins with FF77 followed by the correlative 00FF79, I mean FF7700FF79..
but after this, within the block appears FF79, and this FF79 is the begin of the sub-block and should be confused with the correlative that is inmediatele after FF77.
It could happen to like FF770FF790532064...
Thanks in advance,
Re: Read binary file with line delimeter
Sorry, but in the data I have from your downloaded zip file, I don't see this pattern.
Code:
FF 77 00 FF 78 53 20 64 01 96 59 17 4F 50 44 04 15 26 6F FF FF 00 15 00 0A 48 00 01
5A 00 14 42 00 01 33 00 01 36 00 01 37 00 01 7E 00 01 69 00 01 79 00 00 93 00 01 22
00 00 21 00 01 26 00 01 10 00 01 2B 00 00 2C 00 00 2D 00 00 2E 00 00 55 00 00 56 00
07 2A 00 00 2F 00 00 30 00 00 31 00 00 FF 34 00 80 09 32 C9 06 00 00 A0 00 00 00 80
09 35 C9 06 00 00 A0 00 00 00 80 09 3C C9 06 00 00 A0 00 00 00 FF 77 00 FF 79
Where you are expecting to see ff79, I get ff34 as highlighted above.
Re: Read binary file with line delimeter
Hello 2kaud,
Yes, I see for that file instead of FF79 is FF34. For that file the begin of sub-block is FF74 and each substring begins with 8X (where X=0,1,2,3,6,7), and instead of "05" is "03". The rest is the same.
The sample binary.txt I uploaded before is correct with FF79 as begin of each sub-block. But I'll try to upload a small
sample file that has FF79 as beginning of each sub-block and that has a correlative FF79 too.
1 Attachment(s)
Re: Read binary file with line delimeter
Hello 2kaud,
Attached a small zipped file that contains the patterns mentioned. Contains correlative FF79 and begin of sub-block FF79 too.
Best regards
Re: Read binary file with line delimeter
In the binary.txt file, ff79 does indeed start the beginning of each sub-block. However, for the first part of the requirement, the length of the first section is 9 bytes not 8!
Code:
FF 77 00 00 01 53 20 64 45 01 89 34 55 0F 81 47 45 49 23 2F FF FF
2 byte starter (ff77), 3 byte block number (000001), 9 byte first part (53206445018934550f) then 8 byte second part (81474549232fffff).
The way I have written getBlock() requires fixed 8 byte then 8 byte blocks. If the number of bytes in these two initial blocks can vary, it will mean a code rewrite and potentially slower performance.
Re: Read binary file with line delimeter
Quote:
Originally Posted by
Philidor
Hello 2kaud,
Attached a small zipped file that contains the patterns mentioned. Contains correlative FF79 and begin of sub-block FF79 too.
Best regards
This file looks better. Has the ff79 and 2 8-byte sections at the beginning of each block. I'll try out some code for the second requirement over the next couple of days.
Note my comment from my previous post. If you try to process a file that has a different layout (however small the difference), the program won't produce the expected output.
Re: Read binary file with line delimeter
You're rigth 2kaud, that file has 9 bytes in first part. That is an error in that file. But the correct thing is 2 strings of 8 bytes.
The last sample file is correct in that way and will be functional to test.
Thanks so much again
Re: Read binary file with line delimeter
Try this. I think it outputs what you're after.
Code:
#include <iostream>
#include <iomanip>
#include <fstream>
#include <string>
#include <ctime>
using namespace std;
typedef unsigned char BYTE;
typedef unsigned short int WORD;
typedef unsigned long int DWORD;
#ifndef LOBYTE
#define LOBYTE(w) ((BYTE)((WORD)(w) & 0xff))
#endif
#ifndef HIBYTE
#define HIBYTE(w) ((BYTE)((WORD)(w) >> 8))
#endif
const char hconv[16] = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
const int convh[23] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 0, 0, 0, 10, 11, 12, 13, 14, 15};
const WORD SEPAR = 0xFF77;
const char SBLOCK[] = "FF79";
class FileFields
{
private:
ifstream ifs;
bool opened;
public:
FileFields() : opened(false) {}
~FileFields() {
if (opened)
ifs.close();
}
bool open(const char* name);
bool getBlock(string& field, DWORD& number, string& firstpart, WORD delim = SEPAR);
bool getField(string& field, WORD delim = SEPAR);
};
bool FileFields::open(const char* name) {
ifs.open(name, ios::binary);
return (opened = ifs.is_open());
}
bool FileFields::getBlock(string& field, DWORD& number, string& firstpart, WORD delim)
{
BYTE num[3],
first[16],
by,
ub,
lb;
number = 0;
firstpart = "|";
if (!opened || !ifs.good())
return false;
ifs.read((char*)num, 3);
number = (num[0] << 16) + (num[1] << 8) + num[2];
if (!ifs.good())
return false;
ifs.read((char*)first, 16);
for (int p = 1; p <= 2; p++) {
const int last = p * 8;
for (int i = (p - 1) * 8; i < last; i++)
if ((ub = ((by = first[i]) >> 4)) < 0xf) {
firstpart += hconv[ub];
if ((lb = (by & 0x0f)) < 0xf)
firstpart += hconv[lb];
else
break;
} else
break;
firstpart += '|';
}
return getField(field);
}
bool FileFields::getField(string& field, WORD delim)
{
char by;
bool cont = true;
field = "";
if (!opened || !ifs.good())
return false;
for (ifs.get(by); cont && ifs.gcount(); ifs.get(by)) {
if ((BYTE)by == HIBYTE(delim))
if ((BYTE)ifs.peek() == LOBYTE(delim))
cont = false;
if (cont) {
field += hconv[(BYTE)by >> 4];
field += hconv[(BYTE)by & 0xf];
}
}
return true;
}
int main()
{
FileFields ff;
//if (!ff.open("d:\\philidor\\bin2g")) {
if (!ff.open("d:\\philidor\\binsmall")) {
cout << "Cannot open file!" << endl;
return 1;
}
string header;
ff.getField(header);
string block;
block.reserve(7000);
string preliminar;
preliminar.reserve(7000);
DWORD number;
time_t timest = time(NULL);
for (DWORD blk = 1; ff.getBlock(block, number, preliminar); blk++) {
size_t ff79;
if ((ff79 = block.find(SBLOCK)) != string::npos) {
size_t five;
if ((five = block.find("05", ff79)) != string::npos) {
string cx = block.substr(five + 2);
for (size_t c = 0; c < cx.size(); c+= 2)
if (cx[c] == '9' && (cx[c + 1] >= '0' && cx[c + 1] <= '7' && cx[c + 1] != '5')) {
int slen = (convh[(cx[c + 2] - '0')] * 16 + convh[cx[c + 3] - '0']) * 2 + 4;
preliminar += cx.substr(c, slen) + '|';
c += slen - 2;
}
}
}
cout << number << preliminar << endl;
}
cout << "Time taken: " << time(NULL) - timest << endl;
return 0;
}
Re: Read binary file with line delimeter
Hello 2kaud,
Thanks for the great help!
It getting the strings but is getting other strings too. For the first 4 blocks it seems to get the correct strings, but for the laststring shouln't get any substring. The strings I want to get are allways after "059X" and 9506 (excluded).
So the string stars 059XAABBCC + some characters +.. + .. 940E + 14 bytes + 9506+..+.
The string 940E+14 bytes is always the last string and the string that begins with 9506 I don't needed, is only for reference.
For more details each substring is composed like this:
9XAABBCC + some characters
Where
X = 0,1,2,3,6,7
AA = 15 to 141 (in hex 0F to 8D)
BB = 01 to 10 (in hex 01 to 10)
CC = 00 to 255 (in hex 00 to FF)
Thanks for the help.
Re: Read binary file with line delimeter
Can you attach a sample file with the expected output as the posted code works with the previous sample data in post #112.
Re: Read binary file with line delimeter
The supplied data doesn't have any '95' data blocks so couldn't test. I've slightly changed the special case handling. Does this version only produce the strings required? Please attach a sample file that covers all the cases.
Code:
for (DWORD blk = 1; ff.getBlock(block, number, preliminar); blk++) {
size_t ff79;
if ((ff79 = block.find(SBLOCK)) != string::npos) {
size_t five;
if ((five = block.find("05", ff79)) != string::npos) {
string cx = block.substr(five + 2);
for (size_t c = 0; c < cx.size(); c+= 2)
if (cx[c] == '9' && (cx[c + 1] >= '0' && cx[c + 1] <= '7' /*&& cx[c + 1] != '5'*/)) {
int slen = (convh[(cx[c + 2] - '0')] * 16 + convh[cx[c + 3] - '0']) * 2 + 4;
if (cx[c + 1] != '5')
preliminar += cx.substr(c, slen) + '|';
if (cx[c + 1] == '4')
break;
c += slen - 2;
}
}
}
cout << number << preliminar << endl;
}
Re: Read binary file with line delimeter
Hello 2kaud,
The thing is that not always appear all subtrings, sometimes could appear only 91XXXXX or 90XXXX and 93XXXXXX. Then, could appear one or more substrings in each subblock.
With the last code using binSmall.txt I get this:
Code:
65398|532064019659172|81440415264|900F0102000000308147526905FFFFFF00|910F01020000013A81475269559FFFFF00|9310010C0000009FFFFFFF0101|960F010E000000EB81475269596FFFFF00|970F01010006F69981475269563FFFFF00|940E0001000001000100FFFF00000101|
65399|532064024496121|81440415265|
65400|532064019659174|81440415266|
65401|532064019659175|81440415267|910F01020000000D8147526905FFFFFF00|9310010C0000000D8147526905FFFFFF0101|960F010C000000565FFFFF00|940E01020102010001FFFFFF02010201|
65402|532064019659176|81440415268|9181495269539FFFFFFF009181495269539FFFFF000103CA030808FECB0A000000000100000000010002|
And expected output is:
Code:
65398|532064019659172|81440415264|900F0102000000308147526905FFFFFF00|910F01020000013A81475269559FFFFF00|9310010C0000009FFFFFFF0101|960F010E000000EB81475269596FFFFF00|970F01010006F69981475269563FFFFF00|940E0001000001000100FFFF00000101
65399|532064024496121|81440415265
65400|532064019659174|81440415266
65401|532064019659175|81440415267|910F01020000000D8147526905FFFFFF00|9310010C0000000D8147526905FFFFFF0101|960F010C000000565FFFFF00|940E01020102010001FFFFFF02010201
65402|532064019659176|81440415268
So, it seems that for the last block is being printed in 4th column a string that begins with 918149.. . and that string shouldn't appear, since even there is a FF79 string (there is a sub-block), there are not the sequence 059X+ ..+..940E+14 bytes.