|
-
April 17th, 2011, 07:38 AM
#1
Binary Pattern Search
Hi there,
How would I go about searching a binary file for this pattern '00 00 00 08 00' then to read 8 bytes after it into a variable.
Any ideas?
Thanks
-
April 17th, 2011, 12:49 PM
#2
Re: Binary Pattern Search
Perhaps you could use regular expressions, sorta like the following ...
Code:
public partial class Form1 : Form
{
ASCIIEncoding e = new ASCIIEncoding();
string binaryString;
string result;
byte[] byteRep = new byte[] {0x37, 0x03, 0x00, 0x00, 0x00, 0x00, 0x08, 0x00,
0x2f, 0x32, 0x2d, 0x1b, 0x0d, 0x2d, 0x03, 0x27,
0x01, 0x02, 0x03, 0x04, 0x05, 0x22, 0x21, 0x20,
0x1f, 0x1e, 0x1d, 0x1c, 0x03, 0x04, 0x05};
// here's the interesting stuff --- ignore the little man behind the curtain (above)
Regex binarySearch = new Regex("\x00\x00\x00\x08\x00(?<binaryText>........)");
MatchCollection matchBinary;
public Form1() {
InitializeComponent();
// you would do this statement in your read function
// ignore this statement - it just sets up the example
binaryString = e.GetString(byteRep);
matchBinary = binarySearch.Matches(binaryString);
foreach(Match match in matchBinary)
result = hexDump(match.Groups["binaryText"].Value);
Much of the above can be ignored ... it's used just to set up the example. The thrust of the suggestion is that the binary data is read in, taking the form of a string.
One defines the Regex using something similar to the declaration of "Regex binarySearch" above, and uses something similar to "MatchCollection match binary" to collect the captures.
Then one uses the foreach ... statement block to process each of the captures (assuming there'll be more'n one match in the file). Here on this machine I converted (ASCII-ized) the binary data to a hex-dump string and displayed it on the form.
I dunno ... might work ... just a suggestion ... worth exactly what you paid for it.
bill
Last edited by ThermoSight; April 17th, 2011 at 12:57 PM.
-
April 17th, 2011, 01:32 PM
#3
Re: Binary Pattern Search
Unfortunately regex only works on strings and you cannot convert an arbitrary byte array to a string. If there is a sequence of bytes which can't be represented by the current charset it will be silently discarded. Run this test and you'll see what I mean the two sequences are not the same, they've been corrupted by the string conversion:
public static void Test ()
{
var bytes = new byte [1024];
for (int i = 0; i < bytes.Length; i ++)
bytes [i] = (byte) i;
var result = Encoding.ASCII.GetBytes (Encoding.ASCII.GetString (bytes));
for (int i = 0; i < bytes.Length; i ++)
if (bytes [i] != result [i])
Console.WriteLine ("Corrupt at: {0}", i);
}
Simplest thing to do is to create a 5 byte array, check if it matches. If it does not, move every element to the left by 1 and read one more byte into the end. Then repeat until you find a match or you reach the end of the stream.
www.monotorrent.com For all your .NET bittorrent needs
NOTE: My code snippets are just snippets. They demonstrate an idea which can be adapted by you to solve your problem. They are not 100% complete and fully functional solutions equipped with error handling.
-
April 17th, 2011, 05:13 PM
#4
Re: Binary Pattern Search
Hi Mutant ...
you are correct, Of Course, Sir.
Indeed, after some time away from the desk, I returned a few minutes ago to play with this a bit and found that it seems that values are capped at 0x3f (or so it seemed in my case - which seems strange ... if MS was going to limit values, then I would have expected 0x7f). It was just coincidence that I didn't try binary values greater than 0x3f in my previous post.
Bummer, Man! My beloved Regex is restricted by the .NET framework. Bum-mer !
but perhaps we can flip the problem around a bit. The following seems to work ....
Code:
StringBuilder bldr = new StringBuilder();
string hexFormatString;
string result;
byte[] fileData = new byte[] {0xFe, 0x03, 0xCD, 0x00, 0x00, 0x00, 0x08, 0x00,
0xf2, 0x32, 0xcd, 0xfb, 0x0d, 0x2d, 0x03, 0x27,
0x01, 0x02, 0x03, 0x04, 0x05, 0x22, 0x21, 0x20,
0x1f, 0x1e, 0x1d, 0x1c, 0x03, 0x04, 0x05};
byte[] tmp = new byte[8];
Regex hexPattern = new Regex("0000000800(?<binaryText>.{16})");
MatchCollection matchBinary;
public Form1() {
InitializeComponent();
// create a string from the binary input
// in hex-dump format
foreach(byte thisbyte in fileData) {
bldr.Append(thisbyte.ToString("x2"));
}
hexFormatString = bldr.ToString();
// then search the hex-dump-format string for the
// pattern he seeks. 'Capture' the eight bytes which
// follow the pattern.
matchBinary = hexPattern.Matches(hexFormatString);
// RECOVERY PROCEDURE
// then for each 'capture' recover the captured bytes
// as hex bytes (rather than their string representation)
foreach (Match match in matchBinary) {
result = match.Groups["binaryText"].Value;
for(int i = 0; i < tmp.Length; i++)
tmp[i] = byte.Parse(result.Substring(2*i,2), NumberStyles.HexNumber);
}
// OR
// THE ALTERNATIVE RECOVERY PROCEDURE
int idx;
foreach (Match match in matchBinary) {
idx = match.Groups["binaryText"].Index / 2;
for (int i = 0; i < tmp2.Length; i++)
tmp2[i] = fileData[idx++];
}
}
Thanks.
bill
Last edited by ThermoSight; April 17th, 2011 at 11:31 PM.
-
April 18th, 2011, 04:15 AM
#5
Re: Binary Pattern Search
Using regex requires you to use a string. This means you need to read the entire binary blob into memory in one go. You'd then have to convert it to a string which takes up four times the memory of the original binary blob. Finally you have to run the regex and grab your output. Sure, it'll work, but it's pretty slow and inefficient so you could only really do it if your binary blob is small and you're only doing it a handful of times. Otherwise you're going to need to do it properly.
www.monotorrent.com For all your .NET bittorrent needs
NOTE: My code snippets are just snippets. They demonstrate an idea which can be adapted by you to solve your problem. They are not 100% complete and fully functional solutions equipped with error handling.
-
April 18th, 2011, 10:59 PM
#6
Re: Binary Pattern Search
ho-hum ...
Code:
namespace Bozo
{
public partial class Form1 : Form {
StringBuilder bldr = new StringBuilder();
byte[] readFromFile;
Regex hexPattern = new Regex("0000000800(?<binaryText>.{16})");
MatchCollection matchBinary;
FileStream bStream = new FileStream("C:\\Bozo.bin", FileMode.Open, FileAccess.Read);
BinaryReader fruity;
public Form1() {
InitializeComponent();
using (fruity = new BinaryReader(bStream)) {
readFromFile = fruity.ReadBytes(31);
}
// create a hex-dump formatted string
// from the binary file input data
foreach (byte thisbyte in readFromFile) {
bldr.Append(thisbyte.ToString("x2"));
}
// then search the hex-dump-format string for the
// pattern he seeks. 'Capture' the eight bytes which
// follow the pattern.
matchBinary = hexPattern.Matches(bldr.ToString());
// No recovery required ... just index the data bytes
// in 'readFromFile' using the index supplied by Regex.
}// end constructor
}// end class Form1
}// end Namespace 'Bozo'
"Using regex requires you to use a string. This means you need to read the entire binary blob into memory in one go"
I don't think so ... certainly I did in this case, but there are only 31 bytes ... If there were a large number of bytes, I think one could read it in segments
"You'd then have to convert it to a string which takes up four times the memory of the original binary blob."
Certainly it would take more space, at least twice, but four times ? But if one is reading in small segments who really cares if the buffer is 1K, 2K or 4K bytes ?
"Sure, it'll work, but it's pretty slow and inefficient so you could only really do it if your binary blob is small "
I grant you that the conversion to a string is inefficient, but IMHO it's a small price to pay for the pleasure (and reliability) of working with Regular Expressions. And once the string is built, it's but a single statement to separate the wheat from the chaff ... and there's really no need for recovery ... Regular Expressions has already done the indexing for the user.
That's what I call Instant Gratification.
Bye for now, 'Fruit!
Last edited by ThermoSight; April 18th, 2011 at 11:08 PM.
-
April 19th, 2011, 04:35 AM
#7
Re: Binary Pattern Search
Certainly it would take more space, at least twice, but four times ?
A byte is 8 bits. A char is 16 bits. You convert each byte (8 bits) into two chars (16 bits). Therefore using 4 times the memory 
But if one is reading in small segments who really cares if the buffer is 1K, 2K or 4K bytes ?
Which now means you have to handle the case where 1/2 your pattern is at the end of your buffer and 1/2 your pattern is at the start of the next buffer (whenever you read that). .NET regex has no streaming API unfortunately so it's not so simple to chunk read things.
It's also worth pointing out that your regex can be trivially outperformed with simple "string.IndexOf (string)" if you're going to go the route of converting your byte array to a string This is a case where regex is definitely not required and shouldn't be used
Last edited by Mutant_Fruit; April 19th, 2011 at 04:37 AM.
www.monotorrent.com For all your .NET bittorrent needs
NOTE: My code snippets are just snippets. They demonstrate an idea which can be adapted by you to solve your problem. They are not 100% complete and fully functional solutions equipped with error handling.
-
April 19th, 2011, 03:01 PM
#8
Re: Binary Pattern Search
 Originally Posted by Mutant_Fruit
Simplest thing to do is to create a 5 byte array, check if it matches.
Okay, but how would I see if it matches the hex value?
All I really want is the offset of the last value of each match. If you understand me.
-
April 19th, 2011, 07:14 PM
#9
Re: Binary Pattern Search
Code:
if (array [0] == 0x0 && array [1] == 0x0 && array [2] = 0x0 && array [3] == 0x8 && array [4] == 0x0)
Console.WriteLine ("WE HAVE A MATCH!!!!");
Bing
www.monotorrent.com For all your .NET bittorrent needs
NOTE: My code snippets are just snippets. They demonstrate an idea which can be adapted by you to solve your problem. They are not 100% complete and fully functional solutions equipped with error handling.
-
April 21st, 2011, 03:34 PM
#10
Re: Binary Pattern Search
Cheers Mutant Fruit, it works well.
Is there anyway that I could retreive the offset of the last byte read, if there is a match?
Last edited by Jay_bo; April 21st, 2011 at 03:57 PM.
-
April 21st, 2011, 06:27 PM
#11
Re: Binary Pattern Search
Code:
//Returns the last index of a match if any; otherwise -1
public static int indexOfMatch(byte[] data, byte[] pattern)
{
for(int i = 0; i < data.Length - pattern.Length + 1; i++)
{
bool isMatch = true;
for(int j = 0; j < pattern.Length; j++)
{
if( data[i+j] != pattern[j] )
{
isMatch = false;
break;
}
}
if( isMatch )
return i + pattern.Length - 1; //Return the index of the last offset of the matched pattern
}
return -1;
}
Should just about do it for the general case?
Best Regards,
BioPhysEngr
http://blog.biophysengr.net
--
All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|