|
-
March 19th, 2008, 07:32 AM
#1
Improve data loading time, pre-caching?
Hi there.
I have a question on improving data loading times. The data sets that I am reading into my program consists of 300-500 files each of size 512 kB.
I have noticed that loading times vary greatly, normal values range from 10s to 30s for loading the ca 200 MB data. However, when reading a data set that has recently been used (not necessarily the previous data set), loading time is around 1s.
- How come?
- Are these files in some kind of cache?
- Since the hard drive cache is 16MB I would not expect several hundred MB of files to be in the disk cache.
- Is there some other kind of cache around that helps improving the reading speed?
- If so, would it be possible to order the OS to do some pre-caching if the next set of data files to be read is known in advance?
And finally, a code question: now I am reading the files using
Code:
BinaryReader binReader = new BinaryReader(File.Open(fileName, FileMode.Open, FileAccess.Read));
byte[] rawBytes = binReader.ReadBytes(512*1024)
- is this the best way to do it or is there any other faster command to read lots of data?
____
Edit: Running VS2005 on a machine with WinXP SP2
Last edited by Cyanide; March 19th, 2008 at 09:34 AM.
-
March 19th, 2008, 10:46 AM
#2
Re: Improve data loading time, pre-caching?
 Originally Posted by Cyanide
when reading a data set that has recently been used (not necessarily the previous data set), loading time is around 1s.
- How come?
- Are these files in some kind of cache?
- Since the hard drive cache is 16MB I would not expect several hundred MB of files to be in the disk cache.
- Is there some other kind of cache around that helps improving the reading speed?
The HD has a cache itself, and it's possible the OS does buffering too.
is this the best way to do it or is there any other faster command to read lots of data?
There's an overloaded FileStream constructor that takes a 'FileOption' as a parameter. Setting FileOptions.SequentialScan may improve performance as the OS can use that as a hint that it should buffer the file.
www.monotorrent.com For all your .NET bittorrent needs
NOTE: My code snippets are just snippets. They demonstrate an idea which can be adapted by you to solve your problem. They are not 100% complete and fully functional solutions equipped with error handling.
-
March 20th, 2008, 05:42 AM
#3
Re: Improve data loading time, pre-caching?
 Originally Posted by Mutant_Fruit
Setting FileOptions.SequentialScan may improve performance as the OS can use that as a hint that it should buffer the file.
Thanks for your response. I tried using the SequentialScan flag but could not notice any perfomance change at all. I assume that this flag can be very useful when reading one data object at the time in a loop, however in my case I read all data at once so it does not make any difference.
Does anyone have any ideas on the pre-caching part? Is it at all possible to tell the OS or HD which files are going to be read soon, in order to improve performance?
-
March 20th, 2008, 08:05 AM
#4
Re: Improve data loading time, pre-caching?
 Originally Posted by Cyanide
Does anyone have any ideas on the pre-caching part? Is it at all possible to tell the OS or HD which files are going to be read soon, in order to improve performance?
Open the files *before* you need them and pre-read them into memory? If you know you are loading 100 files into memory, you could use two dedicated threads to do the work. One thread will just open each file and read it into memory, another thread would then do the loading and whatnot. That'd offer the best performance.
I'm not sure what low-level API calls could be made to hint the OS that you are going to read a file, but surely *opening* the file is the biggest hint you could possibly give
www.monotorrent.com For all your .NET bittorrent needs
NOTE: My code snippets are just snippets. They demonstrate an idea which can be adapted by you to solve your problem. They are not 100% complete and fully functional solutions equipped with error handling.
-
March 22nd, 2008, 08:23 PM
#5
Re: Improve data loading time, pre-caching?
 Originally Posted by Cyanide
Thanks for your response. I tried using the SequentialScan flag but could not notice any perfomance change at all. I assume that this flag can be very useful when reading one data object at the time in a loop, however in my case I read all data at once so it does not make any difference.
Does anyone have any ideas on the pre-caching part? Is it at all possible to tell the OS or HD which files are going to be read soon, in order to improve performance?
I'm not very familiar with the functions/classes you are using but usually stuff that's set up very conveniently that returns arrays or other data types already loaded is slow. You might have some luck investigating Memory Mapped Files. It's a Win API mechanism to map a section of a file into a memory buffer. So for example, instead of reading in 1/2 MB of data using the class you could map say 64 MB of a file directly into allocated memory, then access that memory block as a stream or whatever. It's more work and you'd have to mess with it to get the bugs out. Also you'd probably want to find out how much physical memory is in the system the program is running on and set your memory requests to be proportionate.
Check out http://pinvoke.net/ for C# compatible declarations etc..
-
March 22nd, 2008, 09:27 PM
#6
Re: Improve data loading time, pre-caching?
 Originally Posted by MilesAhead
You might have some luck investigating Memory Mapped Files. It's a Win API mechanism to map a section of a file into a memory buffer.
That won't help in this situation.
While no gain in performance is observed when using MMFs for simply reading a file into RAM...
From: http://msdn2.microsoft.com/en-us/library/ms810613.aspx
If you want faster loading, you'll need threading and you need to read the files into memory before you need to process them. In this scenario, memory mapped files are just an awkward way of doing:
Code:
string path = GetPathToFile();
byte[] data = File.ReadAllBytes(path);
EDIT:
I'm not very familiar with the functions/classes you are using but usually stuff that's set up very conveniently that returns arrays or other data types already loaded is slow
That's a very sweeping statement. Using IO as an example, i'm sure you'd find that reading a block of data at a time is faster than reading one byte at a time...
Last edited by Mutant_Fruit; March 22nd, 2008 at 09:29 PM.
www.monotorrent.com For all your .NET bittorrent needs
NOTE: My code snippets are just snippets. They demonstrate an idea which can be adapted by you to solve your problem. They are not 100% complete and fully functional solutions equipped with error handling.
-
March 26th, 2008, 05:17 PM
#7
Re: Improve data loading time, pre-caching?
 Originally Posted by Mutant_Fruit
That won't help in this situation.
From: http://msdn2.microsoft.com/en-us/library/ms810613.aspx
If you want faster loading, you'll need threading and you need to read the files into memory before you need to process them. In this scenario, memory mapped files are just an awkward way of doing:
Code:
string path = GetPathToFile();
byte[] data = File.ReadAllBytes(path);
EDIT:
That's a very sweeping statement. Using IO as an example, i'm sure you'd find that reading a block of data at a time is faster than reading one byte at a time...
That's great if you happen to have enough memory allocated to your process to ReadAllBytes. What if your database file is 12 GB and your process can only allocate 64 MB? If you cannot construct more efficient data reads than the library defaults then don't venture into it. Just use the cookie cutter code.
-
March 26th, 2008, 06:18 PM
#8
Re: Improve data loading time, pre-caching?
 Originally Posted by MilesAhead
That's great if you happen to have enough memory allocated to your process to ReadAllBytes. What if your database file is 12 GB and your process can only allocate 64 MB?
In his case with 512kb files, a ReadAllBytes is fine.
In the general case with large files, you can use the async BeginRead/EndRead to read the next chunk of data while processing the current chunk. Both of which are more performant than reading and processing synchronously.
www.monotorrent.com For all your .NET bittorrent needs
NOTE: My code snippets are just snippets. They demonstrate an idea which can be adapted by you to solve your problem. They are not 100% complete and fully functional solutions equipped with error handling.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|