-
June 29th, 2010, 07:47 PM
#1
Reading binary files in VBA... quickly!
The last few weeks I have been working on a more or less small and more or less simple VBA Sub in Excel 2003 that processes binary files. In particular, it builds a histogram of byte frequencies over the entire file. As the file in question can be some MB in size, speed is a concern.
What you see here is already the second version. I have already switched from using the Scripting.FileSystemObject to the built-in file access primitives of VBA. This alone gave me a speed boost of about 50%. (Thanks to DataMiser for that tip in another thread. )
Here we go:
Code:
Option Explicit
Sub Count()
Dim fd As FileDialog
Dim fname As String
Dim fnr As Integer
Dim tstring As String * 1
Dim Counts(255) As Long
Dim i As Integer
Set fd = Application.FileDialog(msoFileDialogFilePicker)
fd.Title = "Bytes zählen"
fd.AllowMultiSelect = False
If fd.Show Then
Application.Cursor = xlWait
fname = fd.SelectedItems(1)
fnr = FreeFile()
Open fname For Binary Access Read As #fnr
Range("FileSize").Formula = LOF(fnr) ' Test
Do
Get #fnr, , tstring
If EOF(fnr) Then Exit Do
Counts(Asc(tstring)) = Counts(Asc(tstring)) + 1
Loop
Close #fnr
Range("Titel").Formula = "Bytehäufigkeiten in " & fname
For i = 0 To 255
Range("Häufigkeit").Cells(i + 1).Formula = Counts(i)
Next
Application.Cursor = xlDefault
End If
Set fd = Nothing
End Sub
Some messages and names of worksheet cells are in German, but they're only a few and not really relevant for the processing itself, so I think that's ok.
In other languages like C/C++ I am used to read data like that in bigger chunks (e.g. 512 bytes) into a buffer of the appropriate size in order to reduce the overhead of file access API calls.
But this is not that easy here: The Get command may read some bytes at a time as well, but it always fills the entire buffer, appending zeroes when it reaches EOF while doing so, thus giving me incorrect results. (And it does not return the number of bytes actually read from the file to me.) And as this routine is supposed to work on arbitrary binary files I can make no assumption that the file size is an integral multiple of any other buffer size than a single byte.
It took me some thinking and programming to get to the following twist around that problem, that now looks a bit more complex:
Code:
Option Explicit
Sub Count()
Dim fd As FileDialog
Dim fname As String
Dim fnr As Integer
Dim buffer As String
Const maxbufsize As Integer = 512
Dim filesize As Long, bufsize As Integer
Dim Counts(255) As Long
Dim i As Integer
' These are for testing:
Dim starttime As Date
Set fd = Application.FileDialog(msoFileDialogFilePicker)
fd.Title = "Bytes zählen"
fd.AllowMultiSelect = False
If fd.Show Then
Application.Cursor = xlWait
starttime = Time ' Test
buffer = String(maxbufsize, 0)
fname = fd.SelectedItems(1)
fnr = FreeFile()
Open fname For Binary Access Read As #fnr
filesize = LOF(fnr)
Range("FileSize").Formula = filesize ' Test
Do
bufsize = Min(maxbufsize, filesize - Seek(fnr) + 1)
If bufsize > 0 Then
If bufsize < maxbufsize Then buffer = String(bufsize, 0)
Get #fnr, , buffer
For i = 1 To bufsize
Counts(Asc(Mid(buffer, i, 1))) = Counts(Asc(Mid(buffer, i, 1))) + 1
Next
End If
Loop Until bufsize < maxbufsize
Close #fnr
Range("Titel").Formula = "Bytehäufigkeiten in " & fname
For i = 0 To 255
Range("Häufigkeit").Cells(i + 1).Formula = Counts(i)
Next
Debug.Print "Elapsed time: "; CDate(Time - starttime) ' Test
Application.Cursor = xlDefault
End If
Set fd = Nothing
End Sub
Private Function Min(a As Variant, b As Variant) As Variant
Min = IIf(a < b, a, b)
End Function
It works and gives me a speed boost of about another 200%!
Am I still missing something significant and maybe obvious? I have seen, for example, the discussion about functions like Mid() versus Mid$() in another thread around here, but I think this would not make a significant difference here, does it?
Any contributions welcome!
Last edited by Eri523; June 29th, 2010 at 09:49 PM.
Tags for this Thread
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|