CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 15 of 15

Threaded View

  1. #1
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,675

    Lightbulb Reading binary files in VBA... quickly!

    The last few weeks I have been working on a more or less small and more or less simple VBA Sub in Excel 2003 that processes binary files. In particular, it builds a histogram of byte frequencies over the entire file. As the file in question can be some MB in size, speed is a concern.

    What you see here is already the second version. I have already switched from using the Scripting.FileSystemObject to the built-in file access primitives of VBA. This alone gave me a speed boost of about 50%. (Thanks to DataMiser for that tip in another thread. )

    Here we go:

    Code:
     
    Option Explicit
     
    Sub Count()
        Dim fd As FileDialog
        Dim fname As String
        Dim fnr As Integer
        Dim tstring As String * 1
        Dim Counts(255) As Long
        Dim i As Integer
        
        Set fd = Application.FileDialog(msoFileDialogFilePicker)
        fd.Title = "Bytes zählen"
        fd.AllowMultiSelect = False
        If fd.Show Then
            Application.Cursor = xlWait
            fname = fd.SelectedItems(1)
            fnr = FreeFile()
            Open fname For Binary Access Read As #fnr
            Range("FileSize").Formula = LOF(fnr) ' Test
            Do
                Get #fnr, , tstring
                If EOF(fnr) Then Exit Do
                Counts(Asc(tstring)) = Counts(Asc(tstring)) + 1
            Loop
            Close #fnr
            Range("Titel").Formula = "Bytehäufigkeiten in " & fname
            For i = 0 To 255
                Range("Häufigkeit").Cells(i + 1).Formula = Counts(i)
            Next
            Application.Cursor = xlDefault
        End If
        Set fd = Nothing
    End Sub
    Some messages and names of worksheet cells are in German, but they're only a few and not really relevant for the processing itself, so I think that's ok.

    In other languages like C/C++ I am used to read data like that in bigger chunks (e.g. 512 bytes) into a buffer of the appropriate size in order to reduce the overhead of file access API calls.

    But this is not that easy here: The Get command may read some bytes at a time as well, but it always fills the entire buffer, appending zeroes when it reaches EOF while doing so, thus giving me incorrect results. (And it does not return the number of bytes actually read from the file to me.) And as this routine is supposed to work on arbitrary binary files I can make no assumption that the file size is an integral multiple of any other buffer size than a single byte.

    It took me some thinking and programming to get to the following twist around that problem, that now looks a bit more complex:

    Code:
     
    Option Explicit
     
    Sub Count()
        Dim fd As FileDialog
        Dim fname As String
        Dim fnr As Integer
        Dim buffer As String
        Const maxbufsize As Integer = 512
        Dim filesize As Long, bufsize As Integer
        Dim Counts(255) As Long
        Dim i As Integer
        
        ' These are for testing:
        Dim starttime As Date
        
        Set fd = Application.FileDialog(msoFileDialogFilePicker)
        fd.Title = "Bytes zählen"
        fd.AllowMultiSelect = False
        If fd.Show Then
            Application.Cursor = xlWait
            starttime = Time ' Test
            buffer = String(maxbufsize, 0)
            fname = fd.SelectedItems(1)
            fnr = FreeFile()
            Open fname For Binary Access Read As #fnr
            filesize = LOF(fnr)
            Range("FileSize").Formula = filesize  ' Test
            Do
                bufsize = Min(maxbufsize, filesize - Seek(fnr) + 1)
                If bufsize > 0 Then
                    If bufsize < maxbufsize Then buffer = String(bufsize, 0)
                    Get #fnr, , buffer
                    For i = 1 To bufsize
                        Counts(Asc(Mid(buffer, i, 1))) = Counts(Asc(Mid(buffer, i, 1))) + 1
                    Next
                End If
            Loop Until bufsize < maxbufsize
            Close #fnr
            Range("Titel").Formula = "Byteh&#228;ufigkeiten in " & fname
            For i = 0 To 255
                Range("H&#228;ufigkeit").Cells(i + 1).Formula = Counts(i)
            Next
            Debug.Print "Elapsed time: "; CDate(Time - starttime)  ' Test
            Application.Cursor = xlDefault
        End If
        Set fd = Nothing
    End Sub
     
    Private Function Min(a As Variant, b As Variant) As Variant
        Min = IIf(a < b, a, b)
    End Function
    It works and gives me a speed boost of about another 200%!

    Am I still missing something significant and maybe obvious? I have seen, for example, the discussion about functions like Mid() versus Mid$() in another thread around here, but I think this would not make a significant difference here, does it?

    Any contributions welcome!
    Last edited by Eri523; June 29th, 2010 at 09:49 PM.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured