[RESOLVED] Multiple thread question

Printable View

Show 50 post(s) from this thread on one page

January 18th, 2014, 06:49 AM
2kaud

Re: Multiple thread question

Quote:

Originally Posted by Codeplug

If you the time (and same curiosity as me), I'd be interested in an HD Tach screenshot.

Or you could run my I/O profiling code: http://forums.codeguru.com/showthrea...64#post1801264 (code attached at end of thread). You'll need to run the code posted in the thread first to create the files that profiling code works on.

gg

using stdio_read(), the i/o profiling code gives

Code:

Reading large file...please wait Reading file chunks...please wait Big file size = 2147483647 Chunk file size = 10485760 # Chunks = 205 Chunk Total = 2149580800 Test = stdio_read Buffer size = 4 K Time to read big file = 433809 ms Time to read chunks = 453293 ms Big throughput = 4.9503 MB/s Chunk throughput = 4.74214 MB/s Throughput %Diff = -4.29517 % Test = stdio_read Buffer size = 32 K Time to read big file = 230442 ms Time to read chunks = 261077 ms Big throughput = 9.31898 MB/s Chunk throughput = 8.23351 MB/s Throughput %Diff = -12.3682 % Test = stdio_read Buffer size = 64 K Time to read big file = 229467 ms Time to read chunks = 258999 ms Big throughput = 9.35857 MB/s Chunk throughput = 8.29957 MB/s Throughput %Diff = -11.9945 % Test = stdio_read Buffer size = 128 K Time to read big file = 234273 ms Time to read chunks = 262008 ms Big throughput = 9.16659 MB/s Chunk throughput = 8.20426 MB/s Throughput %Diff = -11.0798 % Test = stdio_read Buffer size = 1024 K Time to read big file = 228264 ms Time to read chunks = 257756 ms Big throughput = 9.40789 MB/s Chunk throughput = 8.3396 MB/s Throughput %Diff = -12.0389 % Test = stdio_read Buffer size = 4096 K Time to read big file = 229798 ms Time to read chunks = 257806 ms Big throughput = 9.34509 MB/s Chunk throughput = 8.33798 MB/s Throughput %Diff = -11.3907 %

Using win32_read() gives GetOverlappedResult failed error 38 (MessageId: ERROR_HANDLE_EOF //// MessageText://// Reached the end of the file.) on reading file chunks.
January 18th, 2014, 06:53 AM
2kaud

Re: Multiple thread question

Quote:

unless each thread decides on his own which files to process,

Each thread only processes one file which is indicated by the function parameter.
January 18th, 2014, 07:04 AM
superbonzo

Re: Multiple thread question

Quote:

Originally Posted by 2kaud

Each thread only processes one file which is indicated by the function parameter.

exactly, then the serial code wil do <get_file,parse_file,get_file,...> while a thread pool with a single worker will do <get_file,get_file,parse_file,get_file,get_file,get_file,....,parse_file,get_file,....[all files queued],parse_file,parse_file,....>. Now, I was conjecturing that the slow down could be due to the peculiar interleaving of IO operations of the serial version and not to the "singlethreadedness" in itself ...

in other words, given the single threaded code, what happens if you collect all files once ( say, put open handles in a vector ) and *then* parse them serially ?
January 18th, 2014, 07:17 AM
2kaud

Re: Multiple thread question

Quote:

Originally Posted by superbonzo

exactly, then the serial code wil do <get_file,parse_file,get_file,...> while a thread pool with a single worker will do <get_file,get_file,parse_file,get_file,get_file,get_file,....,parse_file,get_file,....[all files queued],parse_file,parse_file,....>. Now, I was conjecturing that the slow down could be due to the peculiar interleaving of IO operations of the serial version and not to the "singlethreadedness" in itself ...

in other words, given the single threaded code, what happens if you collect all files once ( say, put open handles in a vector ) and *then* parse them serially ?

The file processing is done using c++ streams. Do I really want a vector of several hundred fstream objects?
January 18th, 2014, 10:00 AM
superbonzo

Re: Multiple thread question

Quote:

Originally Posted by 2kaud

The file processing is done using c++ streams. Do I really want a vector of several hundred fstream objects?

ehm, why not ? it should be very simple to quickly test if this is the case; it's just a matter of splitting a loop in two, isn't it ? in any case, you can also perform the whole thing a fixed number of files ( say ~100 ) at a time to see if it makes a difference ...
January 18th, 2014, 10:15 AM
Codeplug

Re: Multiple thread question

>> Using win32_read() gives GetOverlappedResult failed error 38
Yes, the overlapped framework I posted in that thread isn't quit right. Here is a correct framework here: http://cboard.cprogramming.com/cplus...ml#post1192325 (To be honest, my MD5 code used to use that wrong framework and experienced that same bug at a clients site.)

>> Big throughput = 9.40789 MB/s
>> Chunk throughput = 8.3396 MB/s
Those aren't spectacular #'s compared to my PCIe, 2xSataII, RAID 0 setup which gave high 30's, low 40's. I'm started to align with OReubens' theory that the NAS device (via it's network connectivity) really shines when there are multiple requests being processed at the same time.

gg
January 18th, 2014, 10:17 AM
Codeplug

Re: Multiple thread question

>> Do I really want a vector of several hundred fstream objects?
You would need a pointer since streams are non-copyable.

gg
January 18th, 2014, 10:24 AM
superbonzo

Re: Multiple thread question

Quote:

Originally Posted by Codeplug

You would need a pointer since streams are non-copyable.

.. but they're movable if you have a c++11 compiler, so a vector<fstream> would be ok in that case.
January 18th, 2014, 11:23 AM
2kaud

Re: Multiple thread question

Ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh!!!!!!!!!!!!!!!

Hold the front page. Stop the presses.

As other gurus have commented, they found these findings extremely peculiar. So do I. That's why I've investigated this deeply as I don't like peculiar things that can't be explained.

I've pressed the network guys on this who have now done their own investigations and they have found problems. Something to do with incompatibilities between the configuration of the NAS device, the firewalls and the network managed switches. They've adjusted some parameters (don't ask). Using the new threaded program the timings are about the same (maybe slightly faster) - but using the old program the process time has come down to about 35 minutes. There are red faces all round in that team. Just as well it's a weekend.:eek:

My thanks to eveyone who contributed.
January 18th, 2014, 07:37 PM
Codeplug

Re: [RESOLVED] Multiple thread question

Cool!

What does win32_read() spit out now?

gg
January 19th, 2014, 12:06 PM
2kaud

Re: [RESOLVED] Multiple thread question

These are the new timings using stdio_read again for comparison purposes. The network people have really pulled out all the stops now on this following 'a few words I had with them'. ;)

Code:

Test = stdio_read Buffer size = 4096 K Time to read big file = 38932 ms Time to read chunks = 81665 ms Big throughput = 55.1599 MB/s Chunk throughput = 26.3219 MB/s Throughput %Diff = -70.7837 % Test = stdio_read Buffer size = 1024 K Time to read big file = 37578 ms Time to read chunks = 55113 ms Big throughput = 57.1474 MB/s Chunk throughput = 39.0032 MB/s Throughput %Diff = -37.7413 % Test = stdio_read Buffer size = 128 K Time to read big file = 38507 ms Time to read chunks = 41789 ms Big throughput = 55.7687 MB/s Chunk throughput = 51.4389 MB/s Throughput %Diff = -8.07731 % Test = stdio_read Buffer size = 64 K Time to read big file = 37595 ms Time to read chunks = 40514 ms Big throughput = 57.1215 MB/s Chunk throughput = 53.0577 MB/s Throughput %Diff = -7.3767 % Test = stdio_read Buffer size = 32 K Time to read big file = 38015 ms Time to read chunks = 40693 ms Big throughput = 56.4904 MB/s Chunk throughput = 52.8243 MB/s Throughput %Diff = -6.7074 % Test = stdio_read Buffer size = 4 K Time to read big file = 37365 ms Time to read chunks = 40887 ms Big throughput = 57.4731 MB/s Chunk throughput = 52.5737 MB/s Throughput %Diff = -8.90427 %

For chunk throughput, the best throughput is with a buffer size of 64k. I'm now going to look at how the programs do disk i/o to make sure we get the best perforance.
January 20th, 2014, 08:13 AM
OReubens

Re: Multiple thread question

Quote:

Originally Posted by 2kaud

They've adjusted some parameters (don't ask). Using the new threaded program the timings are about the same (maybe slightly faster) - but using the old program the process time has come down to about 35 minutes.

35 -> 10minutes

is still an impressive performance boost, but it's much more in line with expectations.

You're not probably benefitting a little from the fact that processing overlaps with processing in other files. (this would in a best case scenario reduce total_processing_time to total_processing_time / number_of_cpu_cores)

I'm still expecting network delays/throttling are having some impact, so it might be a good idea to check ack delays and nagle settings towards the NAS.

Command queueing and simultaneous IO (if the NAS is capable thereof) would also contribute a small amount.
January 20th, 2014, 08:22 AM
2kaud

Re: [RESOLVED] Multiple thread question

Quote:

I'm still expecting network delays/throttling are having some impact, so it might be a good idea to check ack delays and nagle settings towards the NAS.

Command queueing and simultaneous IO (if the NAS is capable thereof) would also contribute a small amount.

Thanks.
I've passed your comments to the network people.

Show 50 post(s) from this thread on one page