Reading chunks from large files
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 1 of 3 123 LastLast
Results 1 to 15 of 41

Thread: Reading chunks from large files

  1. #1
    Join Date
    Dec 2005
    Posts
    151

    Reading chunks from large files

    Hi, the code below works fine with files smaller than 2GB, but it also needs to be abe to handle larger files. I need to change the code to API but I have never really worked with those API before, only to get the size of files bigger than 2GB.

    EOF doesn't work and I think I also need to use SetFilePointer, but it turns into a mess.

    Can anyone please have a look at the code and tell what to change exactly? Thank you.

    Original code:
    Code:
    Public Function HashFile(ByVal strFilename As String)
                     
        Dim fFile As String, sTheString As String, sTheHash As String, sFinalHash As String
        Dim FF As Long, cCount As Long, cFile As Long, i As Integer
        Dim b() As Byte
    
        Const CHUNK_SIZE As Long = 9728000
        
        Dim cCrypto As CryptKci.clsCryptoAPI
        Set cCrypto = New CryptKci.clsCryptoAPI
            
            sTheHash = ""
            cFile = FileLen(strFilename)
            FF = FreeFile()
            ReDim b(0 To CHUNK_SIZE - 1)
    
            Open strFilename For Binary Access Read Lock Write As #FF
            
            Do While Not EOF(FF)
                i = i + 1
    
                If cFile < CHUNK_SIZE Then
                    ReDim b(0 To cFile - 1)
                    Get #FF, , b()
                    sTheString = StrConv(b(), vbUnicode)
                    sTheHash = sTheHash & cCrypto.CreateHash(sTheString, 2)
                    frmMain.txtHashSet.Text = frmMain.txtHashSet.Text & i & " - " & cCrypto.CreateHash(sTheString, 2) & " - " & cFile & vbCrLf
                    Exit Do
                Else
                    Get #FF, , b()
                    sTheString = StrConv(b(), vbUnicode)
                    sTheHash = sTheHash & cCrypto.CreateHash(sTheString, 2)
                    frmMain.txtHashSet.Text = frmMain.txtHashSet.Text & i & " - " & cCrypto.CreateHash(sTheString, 2) & " - " & "9728000" & vbCrLf
                End If
    
            cFile = cFile - CHUNK_SIZE
            DoEvents
            Loop
            
            Close #FF
            
            sFinalHash = cCrypto.ConvertStringFromHex(sTheHash)
            frmMain.txtHash.Text = cCrypto.CreateHash(sFinalHash, 2)
            
        Set cCrypto = Nothing
    End Function
    New code:
    Code:
    Const FILE_BEGIN = 0
    Const FILE_SHARE_READ = &H1
    Const FILE_SHARE_WRITE = &H2
    Const OPEN_EXISTING = 3
    Const GENERIC_READ = &H80000000
    
    Private Declare Function ReadFile Lib "kernel32" (ByVal hFile As Long, lpBuffer As Any, ByVal nNumberOfBytesToRead As Long, lpNumberOfBytesRead As Long, ByVal lpOverlapped As Any) As Long
    Private Declare Function CreateFile Lib "kernel32" Alias "CreateFileA" (ByVal lpFileName As String, ByVal dwDesiredAccess As Long, ByVal dwShareMode As Long, ByVal lpSecurityAttributes As Any, ByVal dwCreationDisposition As Long, ByVal dwFlagsAndAttributes As Long, ByVal hTemplateFile As Long) As Long
    Private Declare Function CloseHandle Lib "kernel32" (ByVal hObject As Long) As Long
    Private Declare Function SetFilePointer Lib "kernel32" (ByVal hFile As Long, ByVal lDistanceToMove As Long, lpDistanceToMoveHigh As Long, ByVal dwMoveMethod As Long) As Long
    Private Declare Function GetFileSize Lib "kernel32" (ByVal hFile As Long, lpFileSizeHigh As Long) As Long
    
    Public Function HashFile(ByVal strFilename As String)
        
        Dim sTheString As String, sTheHash As String, sFinalHash As String
        Dim i As Integer, b() As Byte
        Dim hFile As Long, FileLenght As Long, Result As Long
    
        Dim cCrypto As CryptKci.clsCryptoAPI
        Set cCrypto = New CryptKci.clsCryptoAPI
        
        Const CHUNK_SIZE As Long = 9728000
        
            hFile = CreateFile(strFilename, GENERIC_READ, FILE_SHARE_READ Or FILE_SHARE_WRITE, ByVal 0&, OPEN_EXISTING, 0, 0)
            sTheHash = ""
            FileLenght = GetFileSize(hFile, 0)
            SetFilePointer hFile, 0, 0, FILE_BEGIN
            ReDim b(0 To CHUNK_SIZE - 1)
            
            Do While Not EOF(hFile)
                i = i + 1
        
                If FileLenght < CHUNK_SIZE Then
                    ReDim b(0 To FileLenght - 1)
                    ReadFile hFile, b(1), UBound(b), Result, ByVal 0&
                    sTheString = StrConv(b(), vbUnicode)
                    sTheHash = sTheHash & cCrypto.CreateHash(sTheString, 2)
                    frmMain.txtHashSet.Text = frmMain.txtHashSet.Text & i & " - " & cCrypto.CreateHash(sTheString, 2) & " - " & FileLenght & vbCrLf
                    Exit Do
                Else
                    ReadFile hFile, b(1), UBound(b), Result, ByVal 0&
                    sTheString = StrConv(b(), vbUnicode)
                    sTheHash = sTheHash & cCrypto.CreateHash(sTheString, 2)
                    frmMain.txtHashSet.Text = frmMain.txtHashSet.Text & i & " - " & cCrypto.CreateHash(sTheString, 2) & " - " & "9728000" & vbCrLf
                End If
        
            FileLenght = FileLenght - CHUNK_SIZE
            DoEvents
            Loop
        
    
            sFinalHash = cCrypto.ConvertStringFromHex(sTheHash)
            frmMain.txtHash.Text = cCrypto.CreateHash(sFinalHash, 2)
        
        CloseHandle hFile
    
        Set cCrypto = Nothing
    End Function

  2. #2
    Join Date
    Aug 2000
    Location
    Essex, Uk
    Posts
    1,214

    Re: Reading chunks from large files

    I notice that in your original code you are using the 'old' method of opening and reading a file.

    I'm not totally sure, but it may be worth making a reference to the 'Microsoft Scripting Runtime' and using the methods in there to try and read your file.
    If you find my answers helpful, dont forget to rate me

  3. #3
    Join Date
    Jun 2005
    Location
    JHB South Africa
    Posts
    3,772

    Re: Reading chunks from large files

    Hmm i got your problem in the bag.. ..

    to use API's to open, read and write files is not that difficult at all.

    Attached is a API Module you can include into your project and do calls very simular to OPEN, Read, Write, and they will be doen via Windows API's.. Added is a extra funtion allowing you to cut files shorter (Set a new end of file) Something that VB6 commands lack..

    for a working example of this api, follow the Hex-Editor link in my signature..

    Hope this Helps...

    Gremmy....
    ----- EDIT ------
    Updated API Module on later post...
    Last edited by GremlinSA; November 25th, 2006 at 12:16 PM. Reason: Removed Attachement..
    Articles VB6 : Break the 2G limit - Animation 1, 2 VB.NET : 2005/8 : Moving Images , Animation 1 , 2 , 3 , User Controls
    WPF Articles : 3D Animation 1 , 2 , 3
    Code snips: VB6 Hex Edit, IP Chat, Copy Prot., Crop, Zoom : .NET IP Chat (V4), Adv. ContextMenus, click Hotspot, Scroll Controls
    Find me in ASP.NET., VB6., VB.NET , Writing Articles, My Genealogy, Forum
    All VS.NET: posts refer to VS.NET 2008 (Pro) unless otherwise stated.

  4. #4
    Join Date
    Jul 2000
    Location
    Milano, Italy
    Posts
    7,726

    Re: Reading chunks from large files

    ...at present time, using mainly Net 4.0, Vs 2010



    Special thanks to Lothar "the Great" Haensler, Chris Eastwood , dr_Michael, ClearCode, Iouri and
    all the other wonderful people who made and make Codeguru a great place.
    Come back soon, you Gurus.

  5. #5
    Join Date
    Dec 2005
    Posts
    151

    Re: Reading chunks from large files

    Thanks for the suggestions. I'm trying GremlinSA's API module first and it makes it easier, but I'm still having some problems with it.

    It returns the wrong size of files bigger than 4GB. My test file is 4.617.568.256 bytes (4.29GB), but API_OpenFile returns a file size of 322.597.374 bytes (File_Len in the code below).

    Another problem I have is how to loop until the end of the file, because I can't use EOF. The final problem is how to read chunks of 9.728.000 bytes.

    I need to read 9.728.000 bytes from the file, hash it, read the next 9.728.000 bytes, hash it, until the end of the file. If the last chunk is smaller than 9.728.000 bytes, then the Chunk_Size needs to be changed to the size of the last chunk or else the hash code will be wrong.

    Code:
    Option Explicit
    
    Public File_Len As Long
    Public File_Num As Long
    
    Public Function HashFile(ByVal strFilename As String)
    
        Dim Seek_Len As Long
        Dim sString As String, sHash As String
        Dim b() As Byte
    
        Const Chunk_Size As Long = 9728000
        
    
            API_OpenFile strFilename, File_Num, File_Len
            Seek_Len = File_Len
            
            'do until what?
            
                If Seek_Len < Chunk_Size Then
                    API_ReadFile File_Num, Seek_Len, File_Len - Seek_Len, b()
                    sString = StrConv(b(), vbUnicode)
                    'sHash = sHash & cCrypto.CreateHash(sString, 2)
                    'frmMain.txtHashSet.Text = frmMain.txtHashSet.Text & i & " - " & cCrypto.CreateHash(sTheString, 2) & " - " & cFile & vbCrLf
                    'Exit Do
                Else
                    API_ReadFile File_Num, Seek_Len, Chunk_Size, b()
                    sString = StrConv(b(), vbUnicode)
                    'sHash = sHash & cCrypto.CreateHash(sString, 2)
                    'frmMain.txtHashSet.Text = frmMain.txtHashSet.Text & i & " - " & cCrypto.CreateHash(sTheString, 2) & " - " & "9728000" & vbCrLf
                End If
    
    
            Seek_Len = Seek_Len - Chunk_Size
            'Loop
    
    
    End Function

  6. #6
    Join Date
    Jun 2005
    Location
    JHB South Africa
    Posts
    3,772

    Re: Reading chunks from large files

    Okay My bad on this one...

    I originaly wrote the module without using the High values in the API's mostly in these two
    Code:
    Private Declare Function SetFilePointer Lib "kernel32" (ByVal hFile As Long, ByVal lDistanceToMove As Long, lpDistanceToMoveHigh As Long, ByVal dwMoveMethod As Long) As Long
    Private Declare Function GetFileSize Lib "kernel32" (ByVal hFile As Long, lpFileSizeHigh As Long) As Long
    Private
    so we just need to update them to use these values.

    The filesize sub could be updated as follows..
    Code:
    Public Sub API_FileSize(ByVal FileNumber As Long, ByRef FileSize As Currency)
    	Dim FileSizeL As Long
    	Dim FileSizeH As Long
    	FileSizeH = 0
    	FileSizeL = GetFileSize(FileNumber, FileSizeH)
    	FileSize = FileSizeL + (FileSizeH * &H10000 * &H10000) ' &H100000000 is not valid in the VB IDE so we just split it..
    	
    End Sub
    and the readfile sub can be updated as follows

    Code:
    Public Sub API_ReadFile(ByVal FileNumber As Long, ByVal Position As currency, ByRef BlockSize As Long, ByRef Data() As Byte)
    Dim PosL As Long
    Dim PosH As Long
    Dim SizeRead As Long
    Dim Ret As Long
    PosL = Position And &HFFFFFFFF
    PosH = (Position / &H10000 / &H10000) And &HFFFFFFFF
    Ret = SetFilePointer(FileNumber, PosL, PosH, FILE_BEGIN)
    Ret = ReadFile(FileNumber, Data(0), BlockSize, SizeRead, 0&)
    BlockSize = SizeRead
    End Sub
    the writefile sub will need the same update.

    this should give you an effective 922,337,203,685,477 Max filesize.

    It's a bit late now, so i will test it better in the morning, and repost a corrected API Module...

    Gremmy..
    Articles VB6 : Break the 2G limit - Animation 1, 2 VB.NET : 2005/8 : Moving Images , Animation 1 , 2 , 3 , User Controls
    WPF Articles : 3D Animation 1 , 2 , 3
    Code snips: VB6 Hex Edit, IP Chat, Copy Prot., Crop, Zoom : .NET IP Chat (V4), Adv. ContextMenus, click Hotspot, Scroll Controls
    Find me in ASP.NET., VB6., VB.NET , Writing Articles, My Genealogy, Forum
    All VS.NET: posts refer to VS.NET 2008 (Pro) unless otherwise stated.

  7. #7
    Join Date
    Dec 2005
    Posts
    151

    Re: Reading chunks from large files

    Thanks. I tried the changes so far, but I get a "ByRef argument type mismatch" error on the line (API_FileSize FileH, FileSize) in the API_OpenFile sub.

  8. #8
    Join Date
    Jun 2005
    Location
    JHB South Africa
    Posts
    3,772

    Re: Reading chunks from large files

    Did you remember to change the Def-types in the calling subs to "Curency" - Filesize and position are now Currency types

    Gremmy...
    Articles VB6 : Break the 2G limit - Animation 1, 2 VB.NET : 2005/8 : Moving Images , Animation 1 , 2 , 3 , User Controls
    WPF Articles : 3D Animation 1 , 2 , 3
    Code snips: VB6 Hex Edit, IP Chat, Copy Prot., Crop, Zoom : .NET IP Chat (V4), Adv. ContextMenus, click Hotspot, Scroll Controls
    Find me in ASP.NET., VB6., VB.NET , Writing Articles, My Genealogy, Forum
    All VS.NET: posts refer to VS.NET 2008 (Pro) unless otherwise stated.

  9. #9
    Join Date
    Dec 2005
    Posts
    151

    Re: Reading chunks from large files

    I've changed them and the error message is gone, but now it returns 0 for the 4.29GB test file. With smaller files below 2GB it returns the correct filesize.

    I removed API_SetEOF and API_WriteFile, because I don't need them.
    Attached Files Attached Files

  10. #10
    Join Date
    Jun 2005
    Location
    JHB South Africa
    Posts
    3,772

    Re: Reading chunks from large files

    VB has a very serious problem working with any number larger than a long, we looking at some real workarounds to get past the 4G limit..

    Given enough time i'm sure that we could work out something...

    theres even a module that our uncle Jonny wrote to handle this sort of poblem, i'm going to look into incorperating it into this module to get it to work for you..

    Gremmy
    Articles VB6 : Break the 2G limit - Animation 1, 2 VB.NET : 2005/8 : Moving Images , Animation 1 , 2 , 3 , User Controls
    WPF Articles : 3D Animation 1 , 2 , 3
    Code snips: VB6 Hex Edit, IP Chat, Copy Prot., Crop, Zoom : .NET IP Chat (V4), Adv. ContextMenus, click Hotspot, Scroll Controls
    Find me in ASP.NET., VB6., VB.NET , Writing Articles, My Genealogy, Forum
    All VS.NET: posts refer to VS.NET 2008 (Pro) unless otherwise stated.

  11. #11
    Join Date
    Jun 2005
    Location
    JHB South Africa
    Posts
    3,772

    Re: Reading chunks from large files

    Quote Originally Posted by Cimperiali
    I just looked at this very closely...

    It's essentially the same code as in my Module, with a little difference. Your still left with the task of splitting the file position variable into 2 - 32 bit words, and this is where most of us get stuck, considering that VB does not have a Unsigned 4 byte number Type....

    As soon as a file passes the 2gig limit(2147483647 bytes) pasing the hex value to a Long would return a negative value (-2147483648), And this is the limitation that we have to work around...

    Gremmy...
    Articles VB6 : Break the 2G limit - Animation 1, 2 VB.NET : 2005/8 : Moving Images , Animation 1 , 2 , 3 , User Controls
    WPF Articles : 3D Animation 1 , 2 , 3
    Code snips: VB6 Hex Edit, IP Chat, Copy Prot., Crop, Zoom : .NET IP Chat (V4), Adv. ContextMenus, click Hotspot, Scroll Controls
    Find me in ASP.NET., VB6., VB.NET , Writing Articles, My Genealogy, Forum
    All VS.NET: posts refer to VS.NET 2008 (Pro) unless otherwise stated.

  12. #12
    Join Date
    Jun 2005
    Location
    JHB South Africa
    Posts
    3,772

    Re: Reading chunks from large files

    Well i think ive done it, And the wife is a little peeved that i'v sitting infront of the PC at 2:00 in the morning, but anyways here it is...

    I've added two small subs to convert a Currency value into two Long's and back, that will allow you to work with large files.. (tested in 1gig steps right up to 15 gig)...

    Enjoy..

    Please just let us know if it works 2 your satisfaction...

    Gremmy....

    --- Edit ----
    New attachment on next post..
    Last edited by GremlinSA; November 25th, 2006 at 12:13 PM. Reason: Removed attachement
    Articles VB6 : Break the 2G limit - Animation 1, 2 VB.NET : 2005/8 : Moving Images , Animation 1 , 2 , 3 , User Controls
    WPF Articles : 3D Animation 1 , 2 , 3
    Code snips: VB6 Hex Edit, IP Chat, Copy Prot., Crop, Zoom : .NET IP Chat (V4), Adv. ContextMenus, click Hotspot, Scroll Controls
    Find me in ASP.NET., VB6., VB.NET , Writing Articles, My Genealogy, Forum
    All VS.NET: posts refer to VS.NET 2008 (Pro) unless otherwise stated.

  13. #13
    Join Date
    Dec 2005
    Posts
    151

    Re: Reading chunks from large files

    Yes, the filesize works fine now. Thank you very much for your help

    The problem I have now is that I get an overflow error when I use the code below as a test and reach the 2GB.

    I get the error on the line: "PosL = Position And &HFFFFFFFF" in the API_ReadFile sub. I tried to change some things myself, but I keep getting the overflow error.

    Code:
    Private b(9727999) As Byte
    
    Private Sub Command1_Click()
    Dim sString As String, cCount As Double, i As Long
    
    Do Until i = 470
        i = i + 1
        
        API_ReadFile File_Num, cCount, 9728000, b()
            
        cCount = CDbl(cCount) + CDbl(9728000)
    
    DoEvents
    Loop
    End Sub

  14. #14
    Join Date
    Jun 2005
    Location
    JHB South Africa
    Posts
    3,772

    Re: Reading chunks from large files

    Oops i updated the write API, but not the Read API...

    Fixed it...

    Gremmy....
    Attached Files Attached Files
    Articles VB6 : Break the 2G limit - Animation 1, 2 VB.NET : 2005/8 : Moving Images , Animation 1 , 2 , 3 , User Controls
    WPF Articles : 3D Animation 1 , 2 , 3
    Code snips: VB6 Hex Edit, IP Chat, Copy Prot., Crop, Zoom : .NET IP Chat (V4), Adv. ContextMenus, click Hotspot, Scroll Controls
    Find me in ASP.NET., VB6., VB.NET , Writing Articles, My Genealogy, Forum
    All VS.NET: posts refer to VS.NET 2008 (Pro) unless otherwise stated.

  15. #15
    Join Date
    Dec 2005
    Posts
    151

    Re: Reading chunks from large files

    Excellent. Thank you

    Is the test code in my previous post the fastest way to read a file in 9728000 byte chunks or do you maybe have any tips to speed it up?

Page 1 of 3 123 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Azure Activities Information Page

Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center