Reading binary files in VBA... quickly!
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 15 of 15

Thread: Reading binary files in VBA... quickly!

  1. #1
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,594

    Lightbulb Reading binary files in VBA... quickly!

    The last few weeks I have been working on a more or less small and more or less simple VBA Sub in Excel 2003 that processes binary files. In particular, it builds a histogram of byte frequencies over the entire file. As the file in question can be some MB in size, speed is a concern.

    What you see here is already the second version. I have already switched from using the Scripting.FileSystemObject to the built-in file access primitives of VBA. This alone gave me a speed boost of about 50%. (Thanks to DataMiser for that tip in another thread. )

    Here we go:

    Code:
     
    Option Explicit
     
    Sub Count()
        Dim fd As FileDialog
        Dim fname As String
        Dim fnr As Integer
        Dim tstring As String * 1
        Dim Counts(255) As Long
        Dim i As Integer
        
        Set fd = Application.FileDialog(msoFileDialogFilePicker)
        fd.Title = "Bytes zählen"
        fd.AllowMultiSelect = False
        If fd.Show Then
            Application.Cursor = xlWait
            fname = fd.SelectedItems(1)
            fnr = FreeFile()
            Open fname For Binary Access Read As #fnr
            Range("FileSize").Formula = LOF(fnr) ' Test
            Do
                Get #fnr, , tstring
                If EOF(fnr) Then Exit Do
                Counts(Asc(tstring)) = Counts(Asc(tstring)) + 1
            Loop
            Close #fnr
            Range("Titel").Formula = "Bytehäufigkeiten in " & fname
            For i = 0 To 255
                Range("Häufigkeit").Cells(i + 1).Formula = Counts(i)
            Next
            Application.Cursor = xlDefault
        End If
        Set fd = Nothing
    End Sub
    Some messages and names of worksheet cells are in German, but they're only a few and not really relevant for the processing itself, so I think that's ok.

    In other languages like C/C++ I am used to read data like that in bigger chunks (e.g. 512 bytes) into a buffer of the appropriate size in order to reduce the overhead of file access API calls.

    But this is not that easy here: The Get command may read some bytes at a time as well, but it always fills the entire buffer, appending zeroes when it reaches EOF while doing so, thus giving me incorrect results. (And it does not return the number of bytes actually read from the file to me.) And as this routine is supposed to work on arbitrary binary files I can make no assumption that the file size is an integral multiple of any other buffer size than a single byte.

    It took me some thinking and programming to get to the following twist around that problem, that now looks a bit more complex:

    Code:
     
    Option Explicit
     
    Sub Count()
        Dim fd As FileDialog
        Dim fname As String
        Dim fnr As Integer
        Dim buffer As String
        Const maxbufsize As Integer = 512
        Dim filesize As Long, bufsize As Integer
        Dim Counts(255) As Long
        Dim i As Integer
        
        ' These are for testing:
        Dim starttime As Date
        
        Set fd = Application.FileDialog(msoFileDialogFilePicker)
        fd.Title = "Bytes zählen"
        fd.AllowMultiSelect = False
        If fd.Show Then
            Application.Cursor = xlWait
            starttime = Time ' Test
            buffer = String(maxbufsize, 0)
            fname = fd.SelectedItems(1)
            fnr = FreeFile()
            Open fname For Binary Access Read As #fnr
            filesize = LOF(fnr)
            Range("FileSize").Formula = filesize  ' Test
            Do
                bufsize = Min(maxbufsize, filesize - Seek(fnr) + 1)
                If bufsize > 0 Then
                    If bufsize < maxbufsize Then buffer = String(bufsize, 0)
                    Get #fnr, , buffer
                    For i = 1 To bufsize
                        Counts(Asc(Mid(buffer, i, 1))) = Counts(Asc(Mid(buffer, i, 1))) + 1
                    Next
                End If
            Loop Until bufsize < maxbufsize
            Close #fnr
            Range("Titel").Formula = "Byteh&#228;ufigkeiten in " & fname
            For i = 0 To 255
                Range("H&#228;ufigkeit").Cells(i + 1).Formula = Counts(i)
            Next
            Debug.Print "Elapsed time: "; CDate(Time - starttime)  ' Test
            Application.Cursor = xlDefault
        End If
        Set fd = Nothing
    End Sub
     
    Private Function Min(a As Variant, b As Variant) As Variant
        Min = IIf(a < b, a, b)
    End Function
    It works and gives me a speed boost of about another 200%!

    Am I still missing something significant and maybe obvious? I have seen, for example, the discussion about functions like Mid() versus Mid$() in another thread around here, but I think this would not make a significant difference here, does it?

    Any contributions welcome!
    Last edited by Eri523; June 29th, 2010 at 10:49 PM.

  2. #2
    DataMiser is offline Super Moderator Power Poster
    Join Date
    Jul 2008
    Location
    WV
    Posts
    4,853

    Re: Reading binary files in VBA... quickly!

    Depending on how large the file may be I will sometimes just set my buffer = filesize and read the entire file at once.

    Code:
    Open fname For Binary Access Read As #fnr
    buffer=space(LOF(fnr))
    get #fnr,,buffer
    close #fnr
    Always use [code][/code] tags when posting code.

  3. #3
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,594

    Re: Reading binary files in VBA... quickly!

    DataMiser,

    thanks for your opinion. I had actually been thinking about buffering the entire file too, but discarded the idea soon. This was mainly based on the assumtion that the maximum length of a String in VBA would be 32 kB or at most 64 kB. This belief was even more strengthened by the VBA help files that always talk of a 2-byte length specification (that does not apply to pure binary files, though) when it's about reading strings from a file. But your post made me look it up and: Surprise, the maximum length of a (variable length) String is 2 GB!

    And I doubt somehow that the Get command would accept such a large buffer. But of course I don't know how Get is implemented internally.

    At this time I start wondering what the function describing the relation between buffer size and speed gain might be. I doubt that a precise scientific analysis of that problem, e.g. like the big-Oh notation, could be done at all. And acquiring enough samples for a reasonable empiric analysis looks a bit labourious to me at the time...

    BTW: The file I did the speed tests with is a PDF downloaded from Intel (not especially for that purpose of course) of about 4.2 MB. Processing this file with the latest version of the VBA code I posted took about five to six seconds on a 1.8 GHz P4.

  4. #4
    DataMiser is offline Super Moderator Power Poster
    Join Date
    Jul 2008
    Location
    WV
    Posts
    4,853

    Re: Reading binary files in VBA... quickly!

    Would be interesting to see what the speed would be. While i have not tested this in VBA or even VB6 I have did a bit with VB.net. I did not however limit the testing to file read speed. I was reading data and sending it via a tcp stream. In the case where I used 512k blocks it was pretty fast but in the case where I read the entire file it was almost instant. In fact at first I thought I had did something wrong because it was so fast that it did not appear to be doing anything when in fact the entire file had been transfered.
    Always use [code][/code] tags when posting code.

  5. #5
    Join Date
    Jan 2006
    Location
    Chicago, IL
    Posts
    14,999

    Re: Reading binary files in VBA... quickly!

    Well, I bought a hard drive with a 32GB Buffer, and while it flies while running, shutting down the server sometimes takes a LONG time!
    David

    CodeGuru Article: Bound Controls are Evil-VB6
    2013 Samples: MS CODE Samples

    CodeGuru Reviewer
    2006 Dell CSP
    2006, 2007 & 2008 MVP Visual Basic
    If your question has been answered satisfactorily, and it has been helpful, then, please, Rate this Post!

  6. #6
    Join Date
    Apr 2009
    Posts
    394

    Re: Reading binary files in VBA... quickly!

    You will find that reading a large file in all at once will be an order of magnitude faster than having to go back and read in so many bytes at a time. But of course, you should perhaps put a limit of how large a file you do this with and if larger than that, you might want to break the read up into chuncks. Say 10 or 20 Mb as this will also be dependent upon the systems free memory as if you go too large, you will slow down once again as things goto the swap file...



    Good Luck

  7. #7
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,594

    Re: Reading binary files in VBA... quickly!

    Well, it looks like 50% of the "inhabitants" of this thread are voting for an as-large-as-possible buffer. And yes, vb5prgrmr, forcing the system to swap out memory would certainly eat up all the speed gains I had gathered before. Usually the task manager reports about 300 to 400 MB of free physical memory (not on the bare system, it is with the "usual stuff" running, including Excel) so there's still some room to expand into.

    In the meantime I did some simple math thinking about the topic and I think I can assume the following:
    • The effect of file access overhead is proportional to 1/maxbufsize, given that buffer size is the only parameter that's varied.
    • Under the same conditions as above, if changing the buffer size from 1 byte to 512 bytes gave me a speed gain of 200%, I will need a buffer of 256 kB to get another 200%.
    Did I miss something obvious here?

    Especially the second point is more or less a rough estimation because the 200% speed gain was measured between the two versions of the Sub in my initial post, so it was not only buffer size that was changed. But the innermost loop that I think can be supposed to consume the majority of processing time has not been changed that much.

    DataMiser, can I assume that in your TCP scenario, you pass the entire buffer to the TCP stream with a single API call? I think it's reasonable to assume that the speed of transporting the data via TCP is limited by network bandwidth (as long as it does not simply go through the loop-back device). And I additionally assume that your API call grabs the whole buffered data at once and does the actual data transfer asynchronously after returning control to your app. In that case it is very likely that the transfer will appear to be instant.

    dglienna, a hard drive with 32 GB of buffer space... It's not too long ago that this was a common size for the entire hard disk. (Well, on servers it is likely to be a bit longer ago.) Flushing a 32 GB buffer with probably lots of pending lazy-write requests may really take a considerable time, especially on a server where the requests are more likely to be scattered across the entire disk. Hopefully the disk firmware sorts the requests before writing to minimeze head movement. But I think current HDs do that even during regular operation, i.e. when not flushing their buffer.

  8. #8
    DataMiser is offline Super Moderator Power Poster
    Join Date
    Jul 2008
    Location
    WV
    Posts
    4,853

    Re: Reading binary files in VBA... quickly!

    The TCP transfer in the case I was referring to is done syncronously from a windows mobile device via a USB connection returning control to my code once the send is completed.
    Always use [code][/code] tags when posting code.

  9. #9
    DataMiser is offline Super Moderator Power Poster
    Join Date
    Jul 2008
    Location
    WV
    Posts
    4,853

    Re: Reading binary files in VBA... quickly!

    The TCP send goes to a server program that I have written, The basic jist of it is the Client sends a upload request with filename and size to the server program. The server responds and the client reads the entire file into a buffer, sends the entire buffer, server checks bytes received trhen sends OK if the bytes received = bytes expected.

    The entire process was so fast that basically the instant I hit the button on the client I saw the OK response indicating that all bytes had been received.

    When I did this with a 512 byte buffer it took about 5 seconds

    btw In this case I was sending a small mdb file about 2 megs
    Always use [code][/code] tags when posting code.

  10. #10
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,594

    Re: Reading binary files in VBA... quickly!

    Quote Originally Posted by DataMiser View Post
    The TCP transfer in the case I was referring to is done syncronously from a windows mobile device via a USB connection returning control to my code once the send is completed.
    I don't know how big your entire file was, but AFAIK an USB 2.0 connection could typically transfer the 512 k buffer in about 1/60 second. This surely would look like instant from a subjective human point of view.

  11. #11
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,594

    Re: Reading binary files in VBA... quickly!

    Quote Originally Posted by DataMiser View Post
    The TCP send goes to a server program that I have written, The basic jist of it is the Client sends a upload request with filename and size to the server program. The server responds and the client reads the entire file into a buffer, sends the entire buffer, server checks bytes received trhen sends OK if the bytes received = bytes expected.

    The entire process was so fast that basically the instant I hit the button on the client I saw the OK response indicating that all bytes had been received.

    When I did this with a 512 byte buffer it took about 5 seconds

    btw In this case I was sending a small mdb file about 2 megs
    Ok, I think I can assume here that your client and server run on different devices, connected by the USB link.

    5 seconds for the 512 k is far away from the speed of the USB link. Apparently that time has been eaten by something in the VB .NET or stuff that is called by it. I assume you tried it more than once to ensure that this is a reliable measurement.

    2 megs at full USB speed would be about 1/15 second. This might be notable by a human but still close to instant.

    But how in the name of all saints could your reply to my post arrive 11 minutes before the post itself!? This is far more than instant! Looks like I have just entered the twilight zone at last...

    BTW: On first sight I mistook the "jist" in your post as a typo of "just", but then the sentence IMO would not have made much sense. Looked it up in Wiktionary and just learned another english word!

  12. #12
    DataMiser is offline Super Moderator Power Poster
    Join Date
    Jul 2008
    Location
    WV
    Posts
    4,853

    Re: Reading binary files in VBA... quickly!

    512 Bytes or .5 kByte Blocks took 5 seconds, this was read 512 bytes, send them, read more ... I was a bit shocked myself at the speed difference.
    Always use [code][/code] tags when posting code.

  13. #13
    Join Date
    Jul 2006
    Location
    Germany
    Posts
    3,724

    Re: Reading binary files in VBA... quickly!

    If I may horn in (being off topic):
    As I'm always interested in learning english, too, what's that fuzz about the word "jist"?

    When I looked it up in Wiktionary it says "alternate spelling of 'just'", not to mention that a normal dictionary wouldn't even contain the word.

  14. #14
    DataMiser is offline Super Moderator Power Poster
    Join Date
    Jul 2008
    Location
    WV
    Posts
    4,853

    Re: Reading binary files in VBA... quickly!

    Sorry, my spelling is not always so good and is a word that I seldom use. Should be gist rather than jist. although it is pronouced jist

    http://en.wikipedia.org/wiki/Gist
    Last edited by DataMiser; July 2nd, 2010 at 11:56 AM.
    Always use [code][/code] tags when posting code.

  15. #15
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,594

    Re: Reading binary files in VBA... quickly!

    WoF,

    yes, the meaning of "jist" that you found was the first thing I saw in that Wiktionary article too. But a few lines down, the same article told me that it also may be used as a noun, and then it is an alternative spelling of "gist". This usage is marked as rare though...

    The fact that these sites of the Wikimedia Foundation often have articles about stuff that can be considered rare is one of the facts that I really like about them. Some weeks ago on another forum (really non-developer stuff of course) I needed a translation of "Heiligenschein". My traditional english/german dictionary made of paper, although quite thick, doesn't know the word at all. This word is not really new and although it may not be considered common in everyday language, it is not really that exotic OTOH. The german Wiktionary, although itself not really "thick", had the word.

    Well, this was some more off-topic stuff...

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center