[RESOLVED] Multiple thread question
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 1 of 3 123 LastLast
Results 1 to 15 of 43

Thread: [RESOLVED] Multiple thread question

  1. #1
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,369

    [RESOLVED] Multiple thread question

    We have a program that sequentially processes a large number of files (currently about 700 expected to increase to about 1500). The program performs the same processing on each file (and doesn't involve any other file) which is io-bound and not cpu-bound. This process takes several hours and it is normally performed overnight.

    I've refactored the program so that the processing for each file is done within its own thread (ie one thread created for the processing of one file). This gives rise to many hundreds of io-bound threads. This refactored program is working with no errors reported and has reduced the total processing time down to about 10 minutes.

    Does any guru know of any problems that might arise having this number of threads (700 to 1500) created/running?

    Thanks.
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  2. #2
    GCDEF is offline Elite Member Power Poster
    Join Date
    Nov 2003
    Posts
    12,092

    Re: Multiple thread question

    I've never tried it, but I would think at some point the overhead of dealing with a lot of threads would negate the benefits you'd get from using them.

  3. #3
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,369

    Re: Multiple thread question

    The programming overhead in this case is minimal - just a simple loop to create the threads and a vector to hold the thread handles. The only bit of thread synchronisation needed is to deal with displaying error messages and I've put that bit inside a critical section.

    It was really having this large number of threads I was wondering about as I've never used this large number before either. But as the time has been reduced from several hours to about 10 minutes I've been running the program about every hour and so far there's been no problems and the processing is as expected. I just don't want to be bitten further down the road when we stop using the old program and rely upon this one instead.

    The only issue I've found is that WaitForMultipleObjects() has a limit of MAXIMUM_WAIT_OBJECTS objects for which it can wait - which on my system is 64. As I have a vector of thread handles, this is easily overcome by having a loop that does a WaitForSingleObject() for each handle.
    Last edited by 2kaud; January 14th, 2014 at 11:27 AM.
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  4. #4
    GCDEF is offline Elite Member Power Poster
    Join Date
    Nov 2003
    Posts
    12,092

    Re: Multiple thread question

    I was thinking more of the OS overhead of juggling that many threads. I would think there's a point where the time involved in swapping them in and out has a detrimental effect. Not sure where that point is though.

  5. #5
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,369

    Re: Multiple thread question

    Quote Originally Posted by GCDEF View Post
    I was thinking more of the OS overhead of juggling that many threads. I would think there's a point where the time involved in swapping them in and out has a detrimental effect. Not sure where that point is though.
    I agree that would be a major factor if the threads were cpu-bound, but as they are io-bound the overhead does not seem a problem as the run time has reduced from over 5 hours to about 10 minutes!
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  6. #6
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,585

    Re: Multiple thread question

    Since all threads are performing an I/O-bound task, I wouldn't expect OS thread management overhead to be the prevalent bottleneck here: Many if not most of the threads would probably be waiting for I/O completion at any given point in time. I'd rather think into the direction of file system and even more storage hardware overhead: If all the threads are writing out to the same physical hard disk, there certainly will be a point in increasing the number of parallelly writing threads, when the combination of file system driver and disk hardware will fail to efficiently manage so many parallel writing streams at a time, resulting in an excessive amount of time spent in head movement, or something similar.

    OTOH, if, hypothetically, each one of the many hundreds of threads had its own physical disk and file system to write to (or mechanical storage overhead would be irrelevant, like with SSDs), there'd most probably be a point in increasing the hread count, when disk interfacing hardware and/or networking will become a bottleneck.

    At any rate, like almost always, there most probably is some sweet spot regarding thread count, that's delicately determined by a non-trivial combination of factors involved in the concrete scenario. And I'd probably take any bet that this sweet spot is not at one of the ends of the thread count scale...
    Last edited by Eri523; January 14th, 2014 at 04:27 PM.
    I was thrown out of college for cheating on the metaphysics exam; I looked into the soul of the boy sitting next to me.

    This is a snakeskin jacket! And for me it's a symbol of my individuality, and my belief... in personal freedom.

  7. #7
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,369

    Re: Multiple thread question

    If all the threads are writing out to the same physical hard disk,
    Yes, all the threads are reading/writing from the same physical disk.

    or mechanical storage overhead would be irrelevant, like with SSDs
    Interesting point. I'll look into that - but I seem to recall an issue with SSDs after so many writes? or has that problem now been solved with the latest SSD's?
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  8. #8
    Join Date
    Nov 2003
    Posts
    1,797

    Re: Multiple thread question

    What exactly does this program do with the files? Is it simple enough that it can be posted so others can analyse and make test runs?

    gg

  9. #9
    Join Date
    Jul 2013
    Posts
    272

    Re: Multiple thread question

    Quote Originally Posted by 2kaud View Post
    Yes, all the threads are reading/writing from the same physical disk.
    So the speedup was from 5 hours to 10 minutes, that's about 30 times.

    And that's a lot since a harddisk basically is a serial device. It suggests you have a RAID system with several physical harddisks and that each file is processed by many reads and writes at random positions, because then the Native Command Queuing would work at its best.

    Or maybe you access the harddisk over a network. That would add latency and could explain at least part of the big speed up from multithreading.

    But still, running more than say 16 threads at the same time shouldn't improve the situation much. Rather the opposite due to overhead.

    I would use a thread pool limited to a certain (optional) number of threads. Each thread in the pool processes one file and continues with a new one as long as there are unprocessed files left. Then you can easily check which pool size gives the best total throughput and you avoid the negative effects of starting an enormous number of threads.

    Finally, it could be that the refactoring itself solved some issue with the old program. I would write a new program that only processes one file and check what takes time where to have a baseline for further optimizations.

    Here's an article about your topic,

    http://www.drdobbs.com/parallel/mult...0300055?pgno=1
    Last edited by razzle; January 14th, 2014 at 10:45 PM.

  10. #10
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,369

    Re: Multiple thread question

    It suggests you have a RAID system with several physical harddisks and that each file is processed by many reads and writes at random positions, because then the Native Command Queuing would work at its best.

    Or maybe you access the harddisk over a network. That would add latency and could explain at least part of the big speed up from multithreading.
    Yes and yes. Its a RAID 5 NAS device with 4 physical drives in the RAID configuration.
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  11. #11
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    3,882

    Re: Multiple thread question

    Quote Originally Posted by 2kaud View Post
    Does any guru know of any problems that might arise having this number of threads (700 to 1500) created/running?
    reasons against, to what degree these apply in your specific case, you'll have to figure out on your own.

    A) There is overhead (CPU time and OS resources) in creating/starting and shutting down a thread.
    B) There is very little reason ever to make more threads than you have CPU cores, you'll end up spending a lot of time context switching between threads.
    C) Each thread requires memory for it's local stack (you can reduce this to bare minimum) and other stuff the OS manages.
    D) there may be physical limits to concurrent threads imposed.

    You may not necessarily notice B and C in particular test runs when the total duration per thread is low enough that threads are ending while you're still making new ones.


    Generally speaking. I'm finding your findings extremely peculiar. If the many-threads solution takes 10Mins, this means the I/O takes 10 minutes at most. I can't see any realistic reason why a single thread solution should take "several hours" if in fact as you claim the program is I/O bound.

    If it is both I/O and CPU bound, then multiple threads may solve it. In that case, you should have at most 10Minutes of processing time per CPU core. so your "several hours" would get close assuming you have 12 or more cores. But then the program isn't I/O bound as you said.
    If it was anywhere near memory bound, then multiple threads would have made the problem worse.


    pure IO over many threads typically shouldn't make things run that much faster compared to 1 thread. Afterall, your harddisk can only service one request at a time. Some really advanced servers have disk arrays and controllers that may allow multiple requests being queued at a time, those tend to be pricey monsters)


    If your app is indeed pure IO bound, then overlapped I/O should be the way to make the app more responsive (not necessarily faster)).
    If it's CPU and IO bound, then as many threads as you have cores, and overlapped I/O with a "job pooling" system should provide for the best possible response time, overall throughput and keep memory/OS resources to a minimum. This can end up being a rather complex solution though. If 1 thread per file "works" well enough, then by all means stick with it if it's a tool for company use only. If you need something that'll run well on all kinds of machines, then it may not be the best way out.

  12. #12
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,369

    Re: Multiple thread question

    But then the program isn't I/O bound as you said.
    But then the program is I/O bound.

    The app is just a console program that does these file manipulations. There's no user interaction so responsiveness isn't an issue. There's a gui front-end from which the user sets up the parameters, but once the user clicks OK, control is passed to this console program (just like compiling a program under MSVS with the IDE).

    If 1 thread per file "works" well enough, then by all means stick with it if it's a tool for company use only.
    Yes, it's just for internal use. The re-factored program is now in normal use and the users are delighted with the speed increase. No issues have been experienced.

    As I didn't have any experience of programs with this many threads I was just interested if any other guru knew of problems that might bite later.

    I had thought of a 'job pooling' system as plan B if the multiple thread plan A didn't work out. But as plan A is working nicely and plan B would end up as a much more complex solution as noted, I'm sticking with plan A.



    Thanks for the feedback.
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  13. #13
    Join Date
    Jul 2013
    Posts
    272

    Re: Multiple thread question

    Quote Originally Posted by 2kaud View Post
    Yes and yes. Its a RAID 5 NAS device with 4 physical drives in the RAID configuration.
    If you have a RAID system, make random accesses to the files and make accesses over a network then the speedup most likely comes from a combination of Native Command Queuing of the harddisk and reduced latency of the network. I can't say which dominates but that can be measured.

    The naive solution to start one thread per file doesn't scale well. You probably don't have optimal throughput already due to congestion and then there's the increased risk of system failure due to overload.

    A much better solution as I suggested is to introduce a threadpool. Then you can experiment with the pool size to get optimal throughput and there's an upper limit to the number of threads that are used.
    Last edited by razzle; January 15th, 2014 at 11:33 AM.

  14. #14
    Join Date
    Aug 2000
    Location
    New York, NY, USA
    Posts
    5,526

    Re: Multiple thread question

    Quote Originally Posted by 2kaud View Post
    But then the program is I/O bound.
    Oh, I so much agree with OReubens - was going to post almost the same opinion.
    Are you 100% sure in your assessments? I am just sooo skeptical that that parallel execution can get you a 30 times increase in performance!
    Your numbers just don't add up, at least - not for me.
    If it takes 5 hours to process 700 files - this is about 30 seconds per file. With no CPU usage, what do you do? Copy from one place to another? What is the typical size of your files?
    I am not really familiar with NCQ, but is the depth of its queue 30? Regardless, it only optimizes the search time on disk, not the pure read/write. In my quick googling I found expected performance increase of 9% over non-NCQ systems. Not nearly 30 times!
    This begs two questions:
    1. Has anything else (besides multithreading) changed?
    2. Is the same amount of work performed?
    I understand that your problem is solved at the moment, but if you have a few minutes I would really appreciate your response.
    Who doesn't want to get 30 times performance increase???
    (I have two six-core Xeon HT processors, for the total of 24 parallel threads, so *technically* I could get 24 times increase from multithreading 100% CPU-bound tasks.)
    Last edited by VladimirF; January 16th, 2014 at 09:41 AM.
    Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
    Convenience and productivity tools for Microsoft Visual Studio:
    FeinViewer - an integrated GDI objects viewer for Visual C++ Debugger, and more...

  15. #15
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,369

    Re: Multiple thread question

    1. Has anything else (besides multithreading) changed?
    No. The original program simply processed each file sequentially in one thread. The current one processes each file in its own thread.

    Is the same amount of work performed?
    2) Yes. The processing of each file hasn't changed and the code used to perform this processing hasn't really changed. The processing for each file was already in a function, so the only changes made to this function were related to this function now being a thread function.

    Obviously, there is some cpu usage used for each file processing, but this is very small compared to the file i/o involved. On a 4 core Xeon system (its quite an old computer), the cpu usage during processing averages about 15% according to task manager. It is consuming about 5% network utilisation talking to the NAS Raid 5 disks.

    I suspect the answer to the vast performance increase is due to the explanation given by razzle in post #13. I also suspect that if the data was held on an internal hard drive using SATA interface etc then I doubt very much if this magnitude of speedup would be obtained in this way.

    To be honest, this magnitude of speedup has surprised me. I didn't expect anywhere near it. I also thought that there might be 'issues' with having this many threads hence my original post. Processing one file per thread really was just an experiment to see what happened. I fully expected to have to go to plan B down the route of thread-pooling as others have pointed out. However, as this simple solution is working so effectively now, I'm going to leave it alone.
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

Page 1 of 3 123 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center