CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 11 of 11
  1. #1
    Join Date
    Apr 2011
    Posts
    4

    Binary Pattern Search

    Hi there,

    How would I go about searching a binary file for this pattern '00 00 00 08 00' then to read 8 bytes after it into a variable.

    Any ideas?

    Thanks

  2. #2
    Join Date
    Oct 2005
    Location
    Seattle, WA U.S.A.
    Posts
    353

    Re: Binary Pattern Search

    Perhaps you could use regular expressions, sorta like the following ...

    Code:
        public partial class Form1 : Form
        {
            ASCIIEncoding e = new ASCIIEncoding();
            string binaryString;
            string result;
            byte[] byteRep = new byte[] {0x37, 0x03, 0x00, 0x00, 0x00, 0x00, 0x08, 0x00, 
                                         0x2f, 0x32, 0x2d, 0x1b, 0x0d, 0x2d, 0x03, 0x27, 
                                         0x01, 0x02, 0x03, 0x04, 0x05, 0x22, 0x21, 0x20,
                                         0x1f, 0x1e, 0x1d, 0x1c, 0x03, 0x04, 0x05};
    
            // here's the interesting stuff --- ignore the little man behind the curtain (above)
            Regex binarySearch = new Regex("\x00\x00\x00\x08\x00(?<binaryText>........)");
            MatchCollection matchBinary;
    
    
    
             public Form1() {
                InitializeComponent();
    
                // you would do this statement in your read function
                // ignore this statement - it just sets up the example
                binaryString = e.GetString(byteRep);
    
    
    
                matchBinary = binarySearch.Matches(binaryString);
                foreach(Match match in matchBinary) 
                    result = hexDump(match.Groups["binaryText"].Value);
    Much of the above can be ignored ... it's used just to set up the example. The thrust of the suggestion is that the binary data is read in, taking the form of a string.

    One defines the Regex using something similar to the declaration of "Regex binarySearch" above, and uses something similar to "MatchCollection match binary" to collect the captures.

    Then one uses the foreach ... statement block to process each of the captures (assuming there'll be more'n one match in the file). Here on this machine I converted (ASCII-ized) the binary data to a hex-dump string and displayed it on the form.

    I dunno ... might work ... just a suggestion ... worth exactly what you paid for it.

    bill
    Last edited by ThermoSight; April 17th, 2011 at 12:57 PM.

  3. #3
    Join Date
    May 2007
    Posts
    1,546

    Re: Binary Pattern Search

    Unfortunately regex only works on strings and you cannot convert an arbitrary byte array to a string. If there is a sequence of bytes which can't be represented by the current charset it will be silently discarded. Run this test and you'll see what I mean the two sequences are not the same, they've been corrupted by the string conversion:

    public static void Test ()
    {
    var bytes = new byte [1024];
    for (int i = 0; i < bytes.Length; i ++)
    bytes [i] = (byte) i;

    var result = Encoding.ASCII.GetBytes (Encoding.ASCII.GetString (bytes));

    for (int i = 0; i < bytes.Length; i ++)
    if (bytes [i] != result [i])
    Console.WriteLine ("Corrupt at: {0}", i);
    }
    Simplest thing to do is to create a 5 byte array, check if it matches. If it does not, move every element to the left by 1 and read one more byte into the end. Then repeat until you find a match or you reach the end of the stream.
    www.monotorrent.com For all your .NET bittorrent needs

    NOTE: My code snippets are just snippets. They demonstrate an idea which can be adapted by you to solve your problem. They are not 100% complete and fully functional solutions equipped with error handling.

  4. #4
    Join Date
    Oct 2005
    Location
    Seattle, WA U.S.A.
    Posts
    353

    Re: Binary Pattern Search

    Hi Mutant ...

    you are correct, Of Course, Sir.

    Indeed, after some time away from the desk, I returned a few minutes ago to play with this a bit and found that it seems that values are capped at 0x3f (or so it seemed in my case - which seems strange ... if MS was going to limit values, then I would have expected 0x7f). It was just coincidence that I didn't try binary values greater than 0x3f in my previous post.

    Bummer, Man! My beloved Regex is restricted by the .NET framework. Bum-mer !

    but perhaps we can flip the problem around a bit. The following seems to work ....

    Code:
            StringBuilder bldr = new StringBuilder();
            string hexFormatString;
            string result;
            byte[] fileData = new byte[] {0xFe, 0x03, 0xCD, 0x00, 0x00, 0x00, 0x08, 0x00, 
                                          0xf2, 0x32, 0xcd, 0xfb, 0x0d, 0x2d, 0x03, 0x27, 
                                          0x01, 0x02, 0x03, 0x04, 0x05, 0x22, 0x21, 0x20,
                                          0x1f, 0x1e, 0x1d, 0x1c, 0x03, 0x04, 0x05};
            byte[] tmp = new byte[8];
    
    
            Regex hexPattern = new Regex("0000000800(?<binaryText>.{16})");
    
            MatchCollection matchBinary;
    
    
    
    
            public Form1() {
                InitializeComponent();
    
                // create a string from the binary input
                // in hex-dump format
                foreach(byte thisbyte in fileData) {
                    bldr.Append(thisbyte.ToString("x2"));
                }
                hexFormatString = bldr.ToString();
    
                // then search the hex-dump-format string for the
                // pattern he seeks. 'Capture' the eight bytes which
                // follow the pattern.
                matchBinary = hexPattern.Matches(hexFormatString);
    
    
                // RECOVERY PROCEDURE
                // then for each 'capture' recover the captured bytes
                // as hex bytes (rather than their string representation)
                foreach (Match match in matchBinary) {
                    result = match.Groups["binaryText"].Value;
                    for(int i = 0; i < tmp.Length; i++)
                        tmp[i] = byte.Parse(result.Substring(2*i,2), NumberStyles.HexNumber);
                }
    
                // OR
    
                // THE ALTERNATIVE RECOVERY PROCEDURE
                int idx;
                foreach (Match match in matchBinary) {
                    idx = match.Groups["binaryText"].Index / 2;
                    for (int i = 0; i < tmp2.Length; i++)
                        tmp2[i] = fileData[idx++];
                }
    
    
            }
    Thanks.

    bill
    Last edited by ThermoSight; April 17th, 2011 at 11:31 PM.

  5. #5
    Join Date
    May 2007
    Posts
    1,546

    Re: Binary Pattern Search

    Using regex requires you to use a string. This means you need to read the entire binary blob into memory in one go. You'd then have to convert it to a string which takes up four times the memory of the original binary blob. Finally you have to run the regex and grab your output. Sure, it'll work, but it's pretty slow and inefficient so you could only really do it if your binary blob is small and you're only doing it a handful of times. Otherwise you're going to need to do it properly.
    www.monotorrent.com For all your .NET bittorrent needs

    NOTE: My code snippets are just snippets. They demonstrate an idea which can be adapted by you to solve your problem. They are not 100% complete and fully functional solutions equipped with error handling.

  6. #6
    Join Date
    Oct 2005
    Location
    Seattle, WA U.S.A.
    Posts
    353

    Re: Binary Pattern Search

    ho-hum ...

    Code:
    namespace Bozo
    {
        public partial class Form1 : Form {
    
            StringBuilder bldr = new StringBuilder();
            byte[] readFromFile;
    
    
            Regex hexPattern = new Regex("0000000800(?<binaryText>.{16})");
    
            MatchCollection matchBinary;
    
            FileStream bStream = new FileStream("C:\\Bozo.bin", FileMode.Open, FileAccess.Read);
            BinaryReader fruity;
    
    
            public Form1() {
                InitializeComponent();
    
                using (fruity = new BinaryReader(bStream)) {
                    readFromFile = fruity.ReadBytes(31);
                }
    
                // create a hex-dump formatted string 
                // from the binary file input data 
                foreach (byte thisbyte in readFromFile) {
                    bldr.Append(thisbyte.ToString("x2"));
                }
    
                // then search the hex-dump-format string for the
                // pattern he seeks. 'Capture' the eight bytes which
                // follow the pattern.
                matchBinary = hexPattern.Matches(bldr.ToString());
    
    
    
    
                // No recovery required ... just index the data bytes
                // in 'readFromFile' using the index supplied by Regex.
            }// end constructor
    
        }// end class Form1
    
    }// end Namespace 'Bozo'

    "Using regex requires you to use a string. This means you need to read the entire binary blob into memory in one go"

    I don't think so ... certainly I did in this case, but there are only 31 bytes ... If there were a large number of bytes, I think one could read it in segments


    "You'd then have to convert it to a string which takes up four times the memory of the original binary blob."

    Certainly it would take more space, at least twice, but four times ? But if one is reading in small segments who really cares if the buffer is 1K, 2K or 4K bytes ?


    "Sure, it'll work, but it's pretty slow and inefficient so you could only really do it if your binary blob is small "

    I grant you that the conversion to a string is inefficient, but IMHO it's a small price to pay for the pleasure (and reliability) of working with Regular Expressions. And once the string is built, it's but a single statement to separate the wheat from the chaff ... and there's really no need for recovery ... Regular Expressions has already done the indexing for the user.

    That's what I call Instant Gratification.

    Bye for now, 'Fruit!
    Last edited by ThermoSight; April 18th, 2011 at 11:08 PM.

  7. #7
    Join Date
    May 2007
    Posts
    1,546

    Re: Binary Pattern Search

    Certainly it would take more space, at least twice, but four times ?
    A byte is 8 bits. A char is 16 bits. You convert each byte (8 bits) into two chars (16 bits). Therefore using 4 times the memory

    But if one is reading in small segments who really cares if the buffer is 1K, 2K or 4K bytes ?
    Which now means you have to handle the case where 1/2 your pattern is at the end of your buffer and 1/2 your pattern is at the start of the next buffer (whenever you read that). .NET regex has no streaming API unfortunately so it's not so simple to chunk read things.

    It's also worth pointing out that your regex can be trivially outperformed with simple "string.IndexOf (string)" if you're going to go the route of converting your byte array to a string This is a case where regex is definitely not required and shouldn't be used
    Last edited by Mutant_Fruit; April 19th, 2011 at 04:37 AM.
    www.monotorrent.com For all your .NET bittorrent needs

    NOTE: My code snippets are just snippets. They demonstrate an idea which can be adapted by you to solve your problem. They are not 100% complete and fully functional solutions equipped with error handling.

  8. #8
    Join Date
    Apr 2011
    Posts
    4

    Re: Binary Pattern Search

    Quote Originally Posted by Mutant_Fruit View Post
    Simplest thing to do is to create a 5 byte array, check if it matches.
    Okay, but how would I see if it matches the hex value?


    All I really want is the offset of the last value of each match. If you understand me.

  9. #9
    Join Date
    May 2007
    Posts
    1,546

    Arrow Re: Binary Pattern Search

    Code:
    if (array [0] == 0x0 && array [1] == 0x0 && array [2] = 0x0 && array [3] == 0x8 && array [4] == 0x0)
        Console.WriteLine ("WE HAVE A MATCH!!!!");
    Bing
    www.monotorrent.com For all your .NET bittorrent needs

    NOTE: My code snippets are just snippets. They demonstrate an idea which can be adapted by you to solve your problem. They are not 100% complete and fully functional solutions equipped with error handling.

  10. #10
    Join Date
    Apr 2011
    Posts
    4

    Re: Binary Pattern Search

    Cheers Mutant Fruit, it works well.

    Is there anyway that I could retreive the offset of the last byte read, if there is a match?
    Last edited by Jay_bo; April 21st, 2011 at 03:57 PM.

  11. #11
    Join Date
    Feb 2011
    Location
    United States
    Posts
    1,016

    Re: Binary Pattern Search

    Code:
    //Returns the last index of a match if any; otherwise -1
    public static int indexOfMatch(byte[] data, byte[] pattern)
    {
       for(int i = 0; i < data.Length - pattern.Length + 1; i++)
       {
            bool isMatch = true;
            for(int j = 0; j < pattern.Length; j++)
            {
                if( data[i+j] != pattern[j] )
                {
                    isMatch = false;
                    break;
                }
            }
            if( isMatch )
                return i + pattern.Length - 1;  //Return the index of the last offset of the matched pattern
       }
       
        return -1;
    }
    Should just about do it for the general case?
    Best Regards,

    BioPhysEngr
    http://blog.biophysengr.net
    --
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured