CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 8 of 8
  1. #1
    Join Date
    Jan 2003
    Location
    Sweden
    Posts
    115

    Question Improve data loading time, pre-caching?

    Hi there.
    I have a question on improving data loading times. The data sets that I am reading into my program consists of 300-500 files each of size 512 kB.

    I have noticed that loading times vary greatly, normal values range from 10s to 30s for loading the ca 200 MB data. However, when reading a data set that has recently been used (not necessarily the previous data set), loading time is around 1s.

    - How come?
    - Are these files in some kind of cache?
    - Since the hard drive cache is 16MB I would not expect several hundred MB of files to be in the disk cache.
    - Is there some other kind of cache around that helps improving the reading speed?
    - If so, would it be possible to order the OS to do some pre-caching if the next set of data files to be read is known in advance?

    And finally, a code question: now I am reading the files using
    Code:
    BinaryReader binReader = new BinaryReader(File.Open(fileName, FileMode.Open, FileAccess.Read));
    byte[] rawBytes = binReader.ReadBytes(512*1024)
    - is this the best way to do it or is there any other faster command to read lots of data?
    ____
    Edit: Running VS2005 on a machine with WinXP SP2
    Last edited by Cyanide; March 19th, 2008 at 09:34 AM.

  2. #2
    Join Date
    May 2007
    Posts
    1,546

    Re: Improve data loading time, pre-caching?

    Quote Originally Posted by Cyanide
    when reading a data set that has recently been used (not necessarily the previous data set), loading time is around 1s.

    - How come?
    - Are these files in some kind of cache?
    - Since the hard drive cache is 16MB I would not expect several hundred MB of files to be in the disk cache.
    - Is there some other kind of cache around that helps improving the reading speed?
    The HD has a cache itself, and it's possible the OS does buffering too.

    is this the best way to do it or is there any other faster command to read lots of data?
    There's an overloaded FileStream constructor that takes a 'FileOption' as a parameter. Setting FileOptions.SequentialScan may improve performance as the OS can use that as a hint that it should buffer the file.
    www.monotorrent.com For all your .NET bittorrent needs

    NOTE: My code snippets are just snippets. They demonstrate an idea which can be adapted by you to solve your problem. They are not 100% complete and fully functional solutions equipped with error handling.

  3. #3
    Join Date
    Jan 2003
    Location
    Sweden
    Posts
    115

    Re: Improve data loading time, pre-caching?

    Quote Originally Posted by Mutant_Fruit
    Setting FileOptions.SequentialScan may improve performance as the OS can use that as a hint that it should buffer the file.
    Thanks for your response. I tried using the SequentialScan flag but could not notice any perfomance change at all. I assume that this flag can be very useful when reading one data object at the time in a loop, however in my case I read all data at once so it does not make any difference.

    Does anyone have any ideas on the pre-caching part? Is it at all possible to tell the OS or HD which files are going to be read soon, in order to improve performance?

  4. #4
    Join Date
    May 2007
    Posts
    1,546

    Re: Improve data loading time, pre-caching?

    Quote Originally Posted by Cyanide
    Does anyone have any ideas on the pre-caching part? Is it at all possible to tell the OS or HD which files are going to be read soon, in order to improve performance?
    Open the files *before* you need them and pre-read them into memory? If you know you are loading 100 files into memory, you could use two dedicated threads to do the work. One thread will just open each file and read it into memory, another thread would then do the loading and whatnot. That'd offer the best performance.

    I'm not sure what low-level API calls could be made to hint the OS that you are going to read a file, but surely *opening* the file is the biggest hint you could possibly give
    www.monotorrent.com For all your .NET bittorrent needs

    NOTE: My code snippets are just snippets. They demonstrate an idea which can be adapted by you to solve your problem. They are not 100% complete and fully functional solutions equipped with error handling.

  5. #5
    Join Date
    Nov 2007
    Posts
    35

    Re: Improve data loading time, pre-caching?

    Quote Originally Posted by Cyanide
    Thanks for your response. I tried using the SequentialScan flag but could not notice any perfomance change at all. I assume that this flag can be very useful when reading one data object at the time in a loop, however in my case I read all data at once so it does not make any difference.

    Does anyone have any ideas on the pre-caching part? Is it at all possible to tell the OS or HD which files are going to be read soon, in order to improve performance?
    I'm not very familiar with the functions/classes you are using but usually stuff that's set up very conveniently that returns arrays or other data types already loaded is slow. You might have some luck investigating Memory Mapped Files. It's a Win API mechanism to map a section of a file into a memory buffer. So for example, instead of reading in 1/2 MB of data using the class you could map say 64 MB of a file directly into allocated memory, then access that memory block as a stream or whatever. It's more work and you'd have to mess with it to get the bugs out. Also you'd probably want to find out how much physical memory is in the system the program is running on and set your memory requests to be proportionate.

    Check out http://pinvoke.net/ for C# compatible declarations etc..

  6. #6
    Join Date
    May 2007
    Posts
    1,546

    Re: Improve data loading time, pre-caching?

    Quote Originally Posted by MilesAhead
    You might have some luck investigating Memory Mapped Files. It's a Win API mechanism to map a section of a file into a memory buffer.
    That won't help in this situation.

    While no gain in performance is observed when using MMFs for simply reading a file into RAM...
    From: http://msdn2.microsoft.com/en-us/library/ms810613.aspx

    If you want faster loading, you'll need threading and you need to read the files into memory before you need to process them. In this scenario, memory mapped files are just an awkward way of doing:
    Code:
    string path = GetPathToFile();
    byte[] data = File.ReadAllBytes(path);
    EDIT:
    I'm not very familiar with the functions/classes you are using but usually stuff that's set up very conveniently that returns arrays or other data types already loaded is slow
    That's a very sweeping statement. Using IO as an example, i'm sure you'd find that reading a block of data at a time is faster than reading one byte at a time...
    Last edited by Mutant_Fruit; March 22nd, 2008 at 09:29 PM.
    www.monotorrent.com For all your .NET bittorrent needs

    NOTE: My code snippets are just snippets. They demonstrate an idea which can be adapted by you to solve your problem. They are not 100% complete and fully functional solutions equipped with error handling.

  7. #7
    Join Date
    Nov 2007
    Posts
    35

    Re: Improve data loading time, pre-caching?

    Quote Originally Posted by Mutant_Fruit
    That won't help in this situation.


    From: http://msdn2.microsoft.com/en-us/library/ms810613.aspx

    If you want faster loading, you'll need threading and you need to read the files into memory before you need to process them. In this scenario, memory mapped files are just an awkward way of doing:
    Code:
    string path = GetPathToFile();
    byte[] data = File.ReadAllBytes(path);
    EDIT:

    That's a very sweeping statement. Using IO as an example, i'm sure you'd find that reading a block of data at a time is faster than reading one byte at a time...
    That's great if you happen to have enough memory allocated to your process to ReadAllBytes. What if your database file is 12 GB and your process can only allocate 64 MB? If you cannot construct more efficient data reads than the library defaults then don't venture into it. Just use the cookie cutter code.

  8. #8
    Join Date
    May 2007
    Posts
    1,546

    Re: Improve data loading time, pre-caching?

    Quote Originally Posted by MilesAhead
    That's great if you happen to have enough memory allocated to your process to ReadAllBytes. What if your database file is 12 GB and your process can only allocate 64 MB?
    In his case with 512kb files, a ReadAllBytes is fine.

    In the general case with large files, you can use the async BeginRead/EndRead to read the next chunk of data while processing the current chunk. Both of which are more performant than reading and processing synchronously.
    www.monotorrent.com For all your .NET bittorrent needs

    NOTE: My code snippets are just snippets. They demonstrate an idea which can be adapted by you to solve your problem. They are not 100% complete and fully functional solutions equipped with error handling.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured