Re: Multiple thread question
Quote:
Originally Posted by
Codeplug
If you the time (and same curiosity as me), I'd be interested in an
HD Tach screenshot.
Or you could run my I/O profiling code:
http://forums.codeguru.com/showthrea...64#post1801264 (code attached at end of thread). You'll need to run the code posted in the thread first to create the files that profiling code works on.
gg
using stdio_read(), the i/o profiling code gives
Code:
Reading large file...please wait
Reading file chunks...please wait
Big file size = 2147483647
Chunk file size = 10485760
# Chunks = 205
Chunk Total = 2149580800
Test = stdio_read
Buffer size = 4 K
Time to read big file = 433809 ms
Time to read chunks = 453293 ms
Big throughput = 4.9503 MB/s
Chunk throughput = 4.74214 MB/s
Throughput %Diff = -4.29517 %
Test = stdio_read
Buffer size = 32 K
Time to read big file = 230442 ms
Time to read chunks = 261077 ms
Big throughput = 9.31898 MB/s
Chunk throughput = 8.23351 MB/s
Throughput %Diff = -12.3682 %
Test = stdio_read
Buffer size = 64 K
Time to read big file = 229467 ms
Time to read chunks = 258999 ms
Big throughput = 9.35857 MB/s
Chunk throughput = 8.29957 MB/s
Throughput %Diff = -11.9945 %
Test = stdio_read
Buffer size = 128 K
Time to read big file = 234273 ms
Time to read chunks = 262008 ms
Big throughput = 9.16659 MB/s
Chunk throughput = 8.20426 MB/s
Throughput %Diff = -11.0798 %
Test = stdio_read
Buffer size = 1024 K
Time to read big file = 228264 ms
Time to read chunks = 257756 ms
Big throughput = 9.40789 MB/s
Chunk throughput = 8.3396 MB/s
Throughput %Diff = -12.0389 %
Test = stdio_read
Buffer size = 4096 K
Time to read big file = 229798 ms
Time to read chunks = 257806 ms
Big throughput = 9.34509 MB/s
Chunk throughput = 8.33798 MB/s
Throughput %Diff = -11.3907 %
Using win32_read() gives GetOverlappedResult failed error 38 (MessageId: ERROR_HANDLE_EOF //// MessageText://// Reached the end of the file.) on reading file chunks.
Re: Multiple thread question
Quote:
unless each thread decides on his own which files to process,
Each thread only processes one file which is indicated by the function parameter.
Re: Multiple thread question
Quote:
Originally Posted by
2kaud
Each thread only processes one file which is indicated by the function parameter.
exactly, then the serial code wil do <get_file,parse_file,get_file,...> while a thread pool with a single worker will do <get_file,get_file,parse_file,get_file,get_file,get_file,....,parse_file,get_file,....[all files queued],parse_file,parse_file,....>. Now, I was conjecturing that the slow down could be due to the peculiar interleaving of IO operations of the serial version and not to the "singlethreadedness" in itself ...
in other words, given the single threaded code, what happens if you collect all files once ( say, put open handles in a vector ) and *then* parse them serially ?
Re: Multiple thread question
Quote:
Originally Posted by
superbonzo
exactly, then the serial code wil do <get_file,parse_file,get_file,...> while a thread pool with a single worker will do <get_file,get_file,parse_file,get_file,get_file,get_file,....,parse_file,get_file,....[all files queued],parse_file,parse_file,....>. Now, I was conjecturing that the slow down could be due to the peculiar interleaving of IO operations of the serial version and not to the "singlethreadedness" in itself ...
in other words, given the single threaded code, what happens if you collect all files once ( say, put open handles in a vector ) and *then* parse them serially ?
The file processing is done using c++ streams. Do I really want a vector of several hundred fstream objects?
Re: Multiple thread question
Quote:
Originally Posted by
2kaud
The file processing is done using c++ streams. Do I really want a vector of several hundred fstream objects?
ehm, why not ? it should be very simple to quickly test if this is the case; it's just a matter of splitting a loop in two, isn't it ? in any case, you can also perform the whole thing a fixed number of files ( say ~100 ) at a time to see if it makes a difference ...
Re: Multiple thread question
>> Using win32_read() gives GetOverlappedResult failed error 38
Yes, the overlapped framework I posted in that thread isn't quit right. Here is a correct framework here: http://cboard.cprogramming.com/cplus...ml#post1192325 (To be honest, my MD5 code used to use that wrong framework and experienced that same bug at a clients site.)
>> Big throughput = 9.40789 MB/s
>> Chunk throughput = 8.3396 MB/s
Those aren't spectacular #'s compared to my PCIe, 2xSataII, RAID 0 setup which gave high 30's, low 40's. I'm started to align with OReubens' theory that the NAS device (via it's network connectivity) really shines when there are multiple requests being processed at the same time.
gg
Re: Multiple thread question
>> Do I really want a vector of several hundred fstream objects?
You would need a pointer since streams are non-copyable.
gg
Re: Multiple thread question
Quote:
Originally Posted by
Codeplug
You would need a pointer since streams are non-copyable.
.. but they're movable if you have a c++11 compiler, so a vector<fstream> would be ok in that case.
Re: Multiple thread question
Ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh!!!!!!!!!!!!!!!
Hold the front page. Stop the presses.
As other gurus have commented, they found these findings extremely peculiar. So do I. That's why I've investigated this deeply as I don't like peculiar things that can't be explained.
I've pressed the network guys on this who have now done their own investigations and they have found problems. Something to do with incompatibilities between the configuration of the NAS device, the firewalls and the network managed switches. They've adjusted some parameters (don't ask). Using the new threaded program the timings are about the same (maybe slightly faster) - but using the old program the process time has come down to about 35 minutes. There are red faces all round in that team. Just as well it's a weekend.:eek:
My thanks to eveyone who contributed.
Re: [RESOLVED] Multiple thread question
Cool!
What does win32_read() spit out now?
gg
Re: [RESOLVED] Multiple thread question
These are the new timings using stdio_read again for comparison purposes. The network people have really pulled out all the stops now on this following 'a few words I had with them'. ;)
Code:
Test = stdio_read
Buffer size = 4096 K
Time to read big file = 38932 ms
Time to read chunks = 81665 ms
Big throughput = 55.1599 MB/s
Chunk throughput = 26.3219 MB/s
Throughput %Diff = -70.7837 %
Test = stdio_read
Buffer size = 1024 K
Time to read big file = 37578 ms
Time to read chunks = 55113 ms
Big throughput = 57.1474 MB/s
Chunk throughput = 39.0032 MB/s
Throughput %Diff = -37.7413 %
Test = stdio_read
Buffer size = 128 K
Time to read big file = 38507 ms
Time to read chunks = 41789 ms
Big throughput = 55.7687 MB/s
Chunk throughput = 51.4389 MB/s
Throughput %Diff = -8.07731 %
Test = stdio_read
Buffer size = 64 K
Time to read big file = 37595 ms
Time to read chunks = 40514 ms
Big throughput = 57.1215 MB/s
Chunk throughput = 53.0577 MB/s
Throughput %Diff = -7.3767 %
Test = stdio_read
Buffer size = 32 K
Time to read big file = 38015 ms
Time to read chunks = 40693 ms
Big throughput = 56.4904 MB/s
Chunk throughput = 52.8243 MB/s
Throughput %Diff = -6.7074 %
Test = stdio_read
Buffer size = 4 K
Time to read big file = 37365 ms
Time to read chunks = 40887 ms
Big throughput = 57.4731 MB/s
Chunk throughput = 52.5737 MB/s
Throughput %Diff = -8.90427 %
For chunk throughput, the best throughput is with a buffer size of 64k. I'm now going to look at how the programs do disk i/o to make sure we get the best perforance.
Re: Multiple thread question
Quote:
Originally Posted by
2kaud
They've adjusted some parameters (don't ask). Using the new threaded program the timings are about the same (maybe slightly faster) - but using the old program the process time has come down to about 35 minutes.
35 -> 10minutes
is still an impressive performance boost, but it's much more in line with expectations.
You're not probably benefitting a little from the fact that processing overlaps with processing in other files. (this would in a best case scenario reduce total_processing_time to total_processing_time / number_of_cpu_cores)
I'm still expecting network delays/throttling are having some impact, so it might be a good idea to check ack delays and nagle settings towards the NAS.
Command queueing and simultaneous IO (if the NAS is capable thereof) would also contribute a small amount.
Re: [RESOLVED] Multiple thread question
Quote:
I'm still expecting network delays/throttling are having some impact, so it might be a good idea to check ack delays and nagle settings towards the NAS.
Command queueing and simultaneous IO (if the NAS is capable thereof) would also contribute a small amount.
Thanks.
I've passed your comments to the network people.