[RESOLVED] Multiple thread question

Printable View

January 14th, 2014, 09:08 AM
2kaud

[RESOLVED] Multiple thread question

We have a program that sequentially processes a large number of files (currently about 700 expected to increase to about 1500). The program performs the same processing on each file (and doesn't involve any other file) which is io-bound and not cpu-bound. This process takes several hours and it is normally performed overnight.

I've refactored the program so that the processing for each file is done within its own thread (ie one thread created for the processing of one file). This gives rise to many hundreds of io-bound threads. This refactored program is working with no errors reported and has reduced the total processing time down to about 10 minutes.

Does any guru know of any problems that might arise having this number of threads (700 to 1500) created/running?

Thanks.
January 14th, 2014, 10:30 AM
GCDEF

Re: Multiple thread question

I've never tried it, but I would think at some point the overhead of dealing with a lot of threads would negate the benefits you'd get from using them.
January 14th, 2014, 12:03 PM
2kaud

Re: Multiple thread question

The programming overhead in this case is minimal - just a simple loop to create the threads and a vector to hold the thread handles. The only bit of thread synchronisation needed is to deal with displaying error messages and I've put that bit inside a critical section.

It was really having this large number of threads I was wondering about as I've never used this large number before either. But as the time has been reduced from several hours to about 10 minutes I've been running the program about every hour and so far there's been no problems and the processing is as expected. I just don't want to be bitten further down the road when we stop using the old program and rely upon this one instead.

The only issue I've found is that WaitForMultipleObjects() has a limit of MAXIMUM_WAIT_OBJECTS objects for which it can wait - which on my system is 64. As I have a vector of thread handles, this is easily overcome by having a loop that does a WaitForSingleObject() for each handle.
January 14th, 2014, 12:37 PM
GCDEF

Re: Multiple thread question

I was thinking more of the OS overhead of juggling that many threads. I would think there's a point where the time involved in swapping them in and out has a detrimental effect. Not sure where that point is though.
January 14th, 2014, 12:46 PM
2kaud

Re: Multiple thread question

Quote:

Originally Posted by GCDEF

I was thinking more of the OS overhead of juggling that many threads. I would think there's a point where the time involved in swapping them in and out has a detrimental effect. Not sure where that point is though.

I agree that would be a major factor if the threads were cpu-bound, but as they are io-bound the overhead does not seem a problem as the run time has reduced from over 5 hours to about 10 minutes!
January 14th, 2014, 05:22 PM
Eri523

Re: Multiple thread question

Since all threads are performing an I/O-bound task, I wouldn't expect OS thread management overhead to be the prevalent bottleneck here: Many if not most of the threads would probably be waiting for I/O completion at any given point in time. I'd rather think into the direction of file system and even more storage hardware overhead: If all the threads are writing out to the same physical hard disk, there certainly will be a point in increasing the number of parallelly writing threads, when the combination of file system driver and disk hardware will fail to efficiently manage so many parallel writing streams at a time, resulting in an excessive amount of time spent in head movement, or something similar.

OTOH, if, hypothetically, each one of the many hundreds of threads had its own physical disk and file system to write to (or mechanical storage overhead would be irrelevant, like with SSDs), there'd most probably be a point in increasing the hread count, when disk interfacing hardware and/or networking will become a bottleneck.

At any rate, like almost always, there most probably is some sweet spot regarding thread count, that's delicately determined by a non-trivial combination of factors involved in the concrete scenario. And I'd probably take any bet that this sweet spot is not at one of the ends of the thread count scale...
January 14th, 2014, 06:11 PM
2kaud

Re: Multiple thread question

Quote:

If all the threads are writing out to the same physical hard disk,

Yes, all the threads are reading/writing from the same physical disk.

Quote:

or mechanical storage overhead would be irrelevant, like with SSDs

Interesting point. I'll look into that - but I seem to recall an issue with SSDs after so many writes? or has that problem now been solved with the latest SSD's?
January 14th, 2014, 07:37 PM
Codeplug

Re: Multiple thread question

What exactly does this program do with the files? Is it simple enough that it can be posted so others can analyse and make test runs?

gg
January 14th, 2014, 10:35 PM
razzle

Re: Multiple thread question

Quote:

Originally Posted by 2kaud

Yes, all the threads are reading/writing from the same physical disk.

So the speedup was from 5 hours to 10 minutes, that's about 30 times.

And that's a lot since a harddisk basically is a serial device. It suggests you have a RAID system with several physical harddisks and that each file is processed by many reads and writes at random positions, because then the Native Command Queuing would work at its best.

Or maybe you access the harddisk over a network. That would add latency and could explain at least part of the big speed up from multithreading.

But still, running more than say 16 threads at the same time shouldn't improve the situation much. Rather the opposite due to overhead.

I would use a thread pool limited to a certain (optional) number of threads. Each thread in the pool processes one file and continues with a new one as long as there are unprocessed files left. Then you can easily check which pool size gives the best total throughput and you avoid the negative effects of starting an enormous number of threads.

Finally, it could be that the refactoring itself solved some issue with the old program. I would write a new program that only processes one file and check what takes time where to have a baseline for further optimizations.

Here's an article about your topic,

http://www.drdobbs.com/parallel/mult...0300055?pgno=1
January 15th, 2014, 06:09 AM
2kaud

Re: Multiple thread question

Quote:

It suggests you have a RAID system with several physical harddisks and that each file is processed by many reads and writes at random positions, because then the Native Command Queuing would work at its best.

Or maybe you access the harddisk over a network. That would add latency and could explain at least part of the big speed up from multithreading.

Yes and yes. Its a RAID 5 NAS device with 4 physical drives in the RAID configuration.
January 15th, 2014, 08:02 AM
OReubens

Re: Multiple thread question

Quote:

Originally Posted by 2kaud

Does any guru know of any problems that might arise having this number of threads (700 to 1500) created/running?

reasons against, to what degree these apply in your specific case, you'll have to figure out on your own.

A) There is overhead (CPU time and OS resources) in creating/starting and shutting down a thread.
B) There is very little reason ever to make more threads than you have CPU cores, you'll end up spending a lot of time context switching between threads.
C) Each thread requires memory for it's local stack (you can reduce this to bare minimum) and other stuff the OS manages.
D) there may be physical limits to concurrent threads imposed.

You may not necessarily notice B and C in particular test runs when the total duration per thread is low enough that threads are ending while you're still making new ones.

Generally speaking. I'm finding your findings extremely peculiar. If the many-threads solution takes 10Mins, this means the I/O takes 10 minutes at most. I can't see any realistic reason why a single thread solution should take "several hours" if in fact as you claim the program is I/O bound.

If it is both I/O and CPU bound, then multiple threads may solve it. In that case, you should have at most 10Minutes of processing time per CPU core. so your "several hours" would get close assuming you have 12 or more cores. But then the program isn't I/O bound as you said.
If it was anywhere near memory bound, then multiple threads would have made the problem worse.

pure IO over many threads typically shouldn't make things run that much faster compared to 1 thread. Afterall, your harddisk can only service one request at a time. Some really advanced servers have disk arrays and controllers that may allow multiple requests being queued at a time, those tend to be pricey monsters)

If your app is indeed pure IO bound, then overlapped I/O should be the way to make the app more responsive (not necessarily faster)).
If it's CPU and IO bound, then as many threads as you have cores, and overlapped I/O with a "job pooling" system should provide for the best possible response time, overall throughput and keep memory/OS resources to a minimum. This can end up being a rather complex solution though. If 1 thread per file "works" well enough, then by all means stick with it if it's a tool for company use only. If you need something that'll run well on all kinds of machines, then it may not be the best way out.
January 15th, 2014, 08:31 AM
2kaud

Re: Multiple thread question

Quote:

But then the program isn't I/O bound as you said.

But then the program is I/O bound.

The app is just a console program that does these file manipulations. There's no user interaction so responsiveness isn't an issue. There's a gui front-end from which the user sets up the parameters, but once the user clicks OK, control is passed to this console program (just like compiling a program under MSVS with the IDE).

Quote:

If 1 thread per file "works" well enough, then by all means stick with it if it's a tool for company use only.

Yes, it's just for internal use. The re-factored program is now in normal use and the users are delighted with the speed increase. No issues have been experienced.

As I didn't have any experience of programs with this many threads I was just interested if any other guru knew of problems that might bite later.

I had thought of a 'job pooling' system as plan B if the multiple thread plan A didn't work out. But as plan A is working nicely and plan B would end up as a much more complex solution as noted, I'm sticking with plan A.

Thanks for the feedback.
January 15th, 2014, 12:29 PM
razzle

Re: Multiple thread question

Quote:

Originally Posted by 2kaud

Yes and yes. Its a RAID 5 NAS device with 4 physical drives in the RAID configuration.

If you have a RAID system, make random accesses to the files and make accesses over a network then the speedup most likely comes from a combination of Native Command Queuing of the harddisk and reduced latency of the network. I can't say which dominates but that can be measured.

The naive solution to start one thread per file doesn't scale well. You probably don't have optimal throughput already due to congestion and then there's the increased risk of system failure due to overload.

A much better solution as I suggested is to introduce a threadpool. Then you can experiment with the pool size to get optimal throughput and there's an upper limit to the number of threads that are used.
January 16th, 2014, 10:38 AM
VladimirF

Re: Multiple thread question

Quote:

Originally Posted by 2kaud

But then the program is I/O bound.

Oh, I so much agree with OReubens - was going to post almost the same opinion.
Are you 100% sure in your assessments? I am just sooo skeptical that that parallel execution can get you a 30 times increase in performance!
Your numbers just don't add up, at least - not for me.
If it takes 5 hours to process 700 files - this is about 30 seconds per file. With no CPU usage, what do you do? Copy from one place to another? What is the typical size of your files?
I am not really familiar with NCQ, but is the depth of its queue 30? Regardless, it only optimizes the search time on disk, not the pure read/write. In my quick googling I found expected performance increase of 9% over non-NCQ systems. Not nearly 30 times!
This begs two questions:
1. Has anything else (besides multithreading) changed?
2. Is the same amount of work performed?
I understand that your problem is solved at the moment, but if you have a few minutes I would really appreciate your response.
Who doesn't want to get 30 times performance increase??? :)
(I have two six-core Xeon HT processors, for the total of 24 parallel threads, so *technically* I could get 24 times increase from multithreading 100% CPU-bound tasks.)
January 16th, 2014, 11:48 AM
2kaud

Re: Multiple thread question

Quote:

1. Has anything else (besides multithreading) changed?

No. The original program simply processed each file sequentially in one thread. The current one processes each file in its own thread.

Quote:

Is the same amount of work performed?

2) Yes. The processing of each file hasn't changed and the code used to perform this processing hasn't really changed. The processing for each file was already in a function, so the only changes made to this function were related to this function now being a thread function.

Obviously, there is some cpu usage used for each file processing, but this is very small compared to the file i/o involved. On a 4 core Xeon system (its quite an old computer), the cpu usage during processing averages about 15% according to task manager. It is consuming about 5% network utilisation talking to the NAS Raid 5 disks.

I suspect the answer to the vast performance increase is due to the explanation given by razzle in post #13. I also suspect that if the data was held on an internal hard drive using SATA interface etc then I doubt very much if this magnitude of speedup would be obtained in this way.

To be honest, this magnitude of speedup has surprised me. I didn't expect anywhere near it. I also thought that there might be 'issues' with having this many threads hence my original post. Processing one file per thread really was just an experiment to see what happened. I fully expected to have to go to plan B down the route of thread-pooling as others have pointed out. However, as this simple solution is working so effectively now, I'm going to leave it alone.
January 16th, 2014, 12:10 PM
Arjay

Re: Multiple thread question

Check out the QueueUserWorkItem api which will manage the threads for you. Try it, see if the performance is the same as the what you are seeing with manually managed threads, and if it is, never look back.
January 16th, 2014, 12:59 PM
2kaud

Re: Multiple thread question

Quote:

Originally Posted by Arjay

Check out the QueueUserWorkItem api which will manage the threads for you. Try it, see if the performance is the same as the what you are seeing with manually managed threads, and if it is, never look back.

Thanks.
I'll experiment and report results.
January 16th, 2014, 06:50 PM
2kaud

Re: Multiple thread question

Quote:

Originally Posted by Arjay

Check out the QueueUserWorkItem api which will manage the threads for you. Try it, see if the performance is the same as the what you are seeing with manually managed threads, and if it is, never look back.

Yes the performance is the same! The only slight complication is now knowing when all the files have been processed (XP).

PS The maximum thread count according to Task Manager is 6!
January 16th, 2014, 10:59 PM
Arjay

Re: Multiple thread question

With regard to determining when all the files have been processed..

If the number of files doesn't change during the processing, simply count the files before processing and decrement the count as each file is processed by QueueUserWorkItem. Use the interlockxxx api to safely change the count variable. When it reaches 0, you are finished.
January 17th, 2014, 08:25 AM
OReubens

Re: Multiple thread question

since this is over network...

I suspect the many-threads solution is benefitting from the fact that the IO requests to the server (and possibly the return route) get packaged/merged into single network packets. Sending stuff faster/more often may also largely override the inate network throttling (see "nagle's algorithm" and "delayed (network) acknowledgement. With stuff piling up on the sending side of the network stack all the delays get canceled and the nagle is forced out.

I'm not really seeing a 30fold increase in this though. Maybe 2x or 3x. Even combining this with command queueing, and multiple IO requests over raid. 30x is still a far way off.

If you have an app that is FILE I/O bound over network. Then the real down to earth issue is "why are you running thos over the network in the first place". THis is asking for either a service on the server, command remoting or RDP or some other means to run the actual processing at the server side literally avoiding the network issues entirely.

But again, if what you have now works for you "fine". You may however find out later that changes in hardware/software either on the client side or on the server side, or somewhere on the network may have drastic effects on run time. that many simultaneous threads is not a reliable model to build on.

As I said before, for an "in-house" solution it may be good enough (for now), but I wouldn't expect too much of it if you wanted this in a commercial product that needs to run on a wide variety of clients/servers/network hardware.
January 17th, 2014, 08:38 AM
2kaud

Re: Multiple thread question

Quote:

Originally Posted by Arjay

With regard to determining when all the files have been processed..

If the number of files doesn't change during the processing, simply count the files before processing and decrement the count as each file is processed by QueueUserWorkItem. Use the interlockxxx api to safely change the count variable. When it reaches 0, you are finished.

Yes, that's what I've done. In the processing thread I use InterlockDecrement() and if the result is 0 I signal an event. In main() I use WaitForSingleObject() on that event. This works OK. The only downside is that now I've got a dependency between the thread function and main() which I didn't have before. As I said, a slight complication - but overall probably a more stable solution as it will probably give optimum throughput even if the data is stored on local disks rather than networked disks. I doubt my original experimental solution would give the same level of performance improvement on local disks.

Interesting results though, demonstrating the power of using multi-threading for i/o bound programs.
January 17th, 2014, 08:48 AM
2kaud

Re: Multiple thread question

Quote:

If you have an app that is FILE I/O bound over network. Then the real down to earth issue is "why are you running thos over the network in the first place". THis is asking for either a service on the server, command remoting or RDP or some other means to run the actual processing at the server side literally avoiding the network issues entirely.

The data is stored on a RAID5 NAS device attached to the LAN. Irrespective of from where/how the program is run, the data still comes over the LAN.

I have, however, asked the network/systems people to look at the NAS device configuration and its performance to make sure we haven't got an undiagnosed problem there.
January 17th, 2014, 10:50 AM
VladimirF

Re: Multiple thread question

Quote:

Originally Posted by 2kaud

I suspect the answer to the vast performance increase is due to the explanation given by razzle in post #13.

I have another crazy idea. Could it be that your IO requests to that NAS were so sparse that it was getting to the stand by/sleep mode and had to wake up for the next IO? That could explain your 30-fold bump, as spinning up the drives sure takes much more time than reading a cluster or two.
January 17th, 2014, 11:34 AM
2kaud

Re: Multiple thread question

Quote:

Originally Posted by VladimirF

I have another crazy idea. Could it be that your IO requests to that NAS were so sparse that it was getting to the stand by/sleep mode and had to wake up for the next IO? That could explain your 30-fold bump, as spinning up the drives sure takes much more time than reading a cluster or two.

?? Don't know. Not my area. Another question for the network/systems people. But I'm suspicious that with the new program running the network utilisation is only 5%.
January 17th, 2014, 11:57 AM
superbonzo

Re: Multiple thread question

for curiosity sake, did you tried using the WT_SET_MAX_THREADPOOL_THREADS(Flags,Limit) macro to set Limit to just one thread to see if the problem was in the original code ? or simply measure the speedup for 1<=Limit<=6 ?
January 17th, 2014, 12:01 PM
2kaud

Re: Multiple thread question

Quote:

Originally Posted by superbonzo

for curiosity sake, did you tried using the WT_SET_MAX_THREADPOOL_THREADS(Flags,Limit) macro to set Limit to just one thread to see if the problem was in the original code ? or simply measure the speedup for 1<=Limit<=6 ?

No. Good point. I'll investigate.
January 17th, 2014, 03:24 PM
2kaud

Re: Multiple thread question

Well, that was interesting. I used the macro to set the Limit to 1 thread and ....... the program completed in just over 10 minutes. Not believing this, I rebuilt the solution and tried again with the same results. Doing it again for the third time I found that the program is still using multiple threads according to Task Manager. So setting the Limit to 1 using the macro to set the Flags parameter basically has no effect.

Reading the API documention, it says
"By default, the thread pool has a maximum of 512 threads per process. To raise the queue limit, use the WT_SET_MAX_THREADPOOL_THREAD macro defined in Winnt.h."

So it looks like you can't use this to reduce the number of theads used, just increase them.
January 17th, 2014, 08:41 PM
Codeplug

Re: Multiple thread question

If you the time (and same curiosity as me), I'd be interested in an HD Tach screenshot.

Or you could run my I/O profiling code: http://forums.codeguru.com/showthrea...64#post1801264 (code attached at end of thread). You'll need to run the code posted in the thread first to create the files that profiling code works on.

gg
January 18th, 2014, 04:31 AM
2kaud

Re: Multiple thread question

Update to my post #27. I replaced the call to QueueUserWorkItem() with a direct call to the thread function - effectively making the program single thread again (confirmed by Task Manager). The time taken is back up to several hours again - so the issue is not a problem in the original code as expected.
January 18th, 2014, 05:16 AM
superbonzo

Re: Multiple thread question

Quote:

Originally Posted by 2kaud

Reading the API documention, it says
"By default, the thread pool has a maximum of 512 threads per process. To raise the queue limit, use the WT_SET_MAX_THREADPOOL_THREAD macro defined in Winnt.h."

... uhm, but the very next sentence reads "Note that your application can improve its performance by keeping the number of worker threads low"... God knows :)

Quote:

Originally Posted by 2kaud

[...] so the issue is not a problem in the original code as expected.

unless each thread decides on his own which files to process, it could still be a problem with the original code; a thread pool with a single thread and a single thread are not the same thing because the sequence of io operations is different; supposedly, in the former main() fills a queue of file paths and concurrently the worker processes them in order; in the latter file paths are gathered and processed serially. I'm not network savvy, but I vaguely recall a similar problem while accessing from windows a (badly configured) samba share ... ( yes, I know this is not very helpful :) )

moreover, it would be interesting to see how the speedup scales with the number of pool threads.
January 18th, 2014, 06:49 AM
2kaud

Re: Multiple thread question

Quote:

Originally Posted by Codeplug

If you the time (and same curiosity as me), I'd be interested in an HD Tach screenshot.

Or you could run my I/O profiling code: http://forums.codeguru.com/showthrea...64#post1801264 (code attached at end of thread). You'll need to run the code posted in the thread first to create the files that profiling code works on.

gg

using stdio_read(), the i/o profiling code gives

Code:

Reading large file...please wait Reading file chunks...please wait Big file size = 2147483647 Chunk file size = 10485760 # Chunks = 205 Chunk Total = 2149580800 Test = stdio_read Buffer size = 4 K Time to read big file = 433809 ms Time to read chunks = 453293 ms Big throughput = 4.9503 MB/s Chunk throughput = 4.74214 MB/s Throughput %Diff = -4.29517 % Test = stdio_read Buffer size = 32 K Time to read big file = 230442 ms Time to read chunks = 261077 ms Big throughput = 9.31898 MB/s Chunk throughput = 8.23351 MB/s Throughput %Diff = -12.3682 % Test = stdio_read Buffer size = 64 K Time to read big file = 229467 ms Time to read chunks = 258999 ms Big throughput = 9.35857 MB/s Chunk throughput = 8.29957 MB/s Throughput %Diff = -11.9945 % Test = stdio_read Buffer size = 128 K Time to read big file = 234273 ms Time to read chunks = 262008 ms Big throughput = 9.16659 MB/s Chunk throughput = 8.20426 MB/s Throughput %Diff = -11.0798 % Test = stdio_read Buffer size = 1024 K Time to read big file = 228264 ms Time to read chunks = 257756 ms Big throughput = 9.40789 MB/s Chunk throughput = 8.3396 MB/s Throughput %Diff = -12.0389 % Test = stdio_read Buffer size = 4096 K Time to read big file = 229798 ms Time to read chunks = 257806 ms Big throughput = 9.34509 MB/s Chunk throughput = 8.33798 MB/s Throughput %Diff = -11.3907 %

Using win32_read() gives GetOverlappedResult failed error 38 (MessageId: ERROR_HANDLE_EOF //// MessageText://// Reached the end of the file.) on reading file chunks.
January 18th, 2014, 06:53 AM
2kaud

Re: Multiple thread question

Quote:

unless each thread decides on his own which files to process,

Each thread only processes one file which is indicated by the function parameter.
January 18th, 2014, 07:04 AM
superbonzo

Re: Multiple thread question

Quote:

Originally Posted by 2kaud

Each thread only processes one file which is indicated by the function parameter.

exactly, then the serial code wil do <get_file,parse_file,get_file,...> while a thread pool with a single worker will do <get_file,get_file,parse_file,get_file,get_file,get_file,....,parse_file,get_file,....[all files queued],parse_file,parse_file,....>. Now, I was conjecturing that the slow down could be due to the peculiar interleaving of IO operations of the serial version and not to the "singlethreadedness" in itself ...

in other words, given the single threaded code, what happens if you collect all files once ( say, put open handles in a vector ) and *then* parse them serially ?
January 18th, 2014, 07:17 AM
2kaud

Re: Multiple thread question

Quote:

Originally Posted by superbonzo

exactly, then the serial code wil do <get_file,parse_file,get_file,...> while a thread pool with a single worker will do <get_file,get_file,parse_file,get_file,get_file,get_file,....,parse_file,get_file,....[all files queued],parse_file,parse_file,....>. Now, I was conjecturing that the slow down could be due to the peculiar interleaving of IO operations of the serial version and not to the "singlethreadedness" in itself ...

in other words, given the single threaded code, what happens if you collect all files once ( say, put open handles in a vector ) and *then* parse them serially ?

The file processing is done using c++ streams. Do I really want a vector of several hundred fstream objects?
January 18th, 2014, 10:00 AM
superbonzo

Re: Multiple thread question

Quote:

Originally Posted by 2kaud

The file processing is done using c++ streams. Do I really want a vector of several hundred fstream objects?

ehm, why not ? it should be very simple to quickly test if this is the case; it's just a matter of splitting a loop in two, isn't it ? in any case, you can also perform the whole thing a fixed number of files ( say ~100 ) at a time to see if it makes a difference ...
January 18th, 2014, 10:15 AM
Codeplug

Re: Multiple thread question

>> Using win32_read() gives GetOverlappedResult failed error 38
Yes, the overlapped framework I posted in that thread isn't quit right. Here is a correct framework here: http://cboard.cprogramming.com/cplus...ml#post1192325 (To be honest, my MD5 code used to use that wrong framework and experienced that same bug at a clients site.)

>> Big throughput = 9.40789 MB/s
>> Chunk throughput = 8.3396 MB/s
Those aren't spectacular #'s compared to my PCIe, 2xSataII, RAID 0 setup which gave high 30's, low 40's. I'm started to align with OReubens' theory that the NAS device (via it's network connectivity) really shines when there are multiple requests being processed at the same time.

gg
January 18th, 2014, 10:17 AM
Codeplug

Re: Multiple thread question

>> Do I really want a vector of several hundred fstream objects?
You would need a pointer since streams are non-copyable.

gg
January 18th, 2014, 10:24 AM
superbonzo

Re: Multiple thread question

Quote:

Originally Posted by Codeplug

You would need a pointer since streams are non-copyable.

.. but they're movable if you have a c++11 compiler, so a vector<fstream> would be ok in that case.
January 18th, 2014, 11:23 AM
2kaud

Re: Multiple thread question

Ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh!!!!!!!!!!!!!!!

Hold the front page. Stop the presses.

As other gurus have commented, they found these findings extremely peculiar. So do I. That's why I've investigated this deeply as I don't like peculiar things that can't be explained.

I've pressed the network guys on this who have now done their own investigations and they have found problems. Something to do with incompatibilities between the configuration of the NAS device, the firewalls and the network managed switches. They've adjusted some parameters (don't ask). Using the new threaded program the timings are about the same (maybe slightly faster) - but using the old program the process time has come down to about 35 minutes. There are red faces all round in that team. Just as well it's a weekend.:eek:

My thanks to eveyone who contributed.
January 18th, 2014, 07:37 PM
Codeplug

Re: [RESOLVED] Multiple thread question

Cool!

What does win32_read() spit out now?

gg
January 19th, 2014, 12:06 PM
2kaud

Re: [RESOLVED] Multiple thread question

These are the new timings using stdio_read again for comparison purposes. The network people have really pulled out all the stops now on this following 'a few words I had with them'. ;)

Code:

Test = stdio_read Buffer size = 4096 K Time to read big file = 38932 ms Time to read chunks = 81665 ms Big throughput = 55.1599 MB/s Chunk throughput = 26.3219 MB/s Throughput %Diff = -70.7837 % Test = stdio_read Buffer size = 1024 K Time to read big file = 37578 ms Time to read chunks = 55113 ms Big throughput = 57.1474 MB/s Chunk throughput = 39.0032 MB/s Throughput %Diff = -37.7413 % Test = stdio_read Buffer size = 128 K Time to read big file = 38507 ms Time to read chunks = 41789 ms Big throughput = 55.7687 MB/s Chunk throughput = 51.4389 MB/s Throughput %Diff = -8.07731 % Test = stdio_read Buffer size = 64 K Time to read big file = 37595 ms Time to read chunks = 40514 ms Big throughput = 57.1215 MB/s Chunk throughput = 53.0577 MB/s Throughput %Diff = -7.3767 % Test = stdio_read Buffer size = 32 K Time to read big file = 38015 ms Time to read chunks = 40693 ms Big throughput = 56.4904 MB/s Chunk throughput = 52.8243 MB/s Throughput %Diff = -6.7074 % Test = stdio_read Buffer size = 4 K Time to read big file = 37365 ms Time to read chunks = 40887 ms Big throughput = 57.4731 MB/s Chunk throughput = 52.5737 MB/s Throughput %Diff = -8.90427 %

For chunk throughput, the best throughput is with a buffer size of 64k. I'm now going to look at how the programs do disk i/o to make sure we get the best perforance.
January 20th, 2014, 08:13 AM
OReubens

Re: Multiple thread question

Quote:

Originally Posted by 2kaud

They've adjusted some parameters (don't ask). Using the new threaded program the timings are about the same (maybe slightly faster) - but using the old program the process time has come down to about 35 minutes.

35 -> 10minutes

is still an impressive performance boost, but it's much more in line with expectations.

You're not probably benefitting a little from the fact that processing overlaps with processing in other files. (this would in a best case scenario reduce total_processing_time to total_processing_time / number_of_cpu_cores)

I'm still expecting network delays/throttling are having some impact, so it might be a good idea to check ack delays and nagle settings towards the NAS.

Command queueing and simultaneous IO (if the NAS is capable thereof) would also contribute a small amount.
January 20th, 2014, 08:22 AM
2kaud

Re: [RESOLVED] Multiple thread question

Quote:

I'm still expecting network delays/throttling are having some impact, so it might be a good idea to check ack delays and nagle settings towards the NAS.

Command queueing and simultaneous IO (if the NAS is capable thereof) would also contribute a small amount.

Thanks.
I've passed your comments to the network people.

All times are GMT -5. The time now is 01:28 AM.