I have written a fairly large application with a small team of software engineers (~237kloc) over the last four years. In our development environment under Windows, the bug that I'm about to describe does not occur. However, when running on our target OS (Fedora 15 32bit with boost 1.47) the bug does occur. The application has a fair number of threads and use a lot of boost::condition_variables, however, I have hit the problem where calling condition_variable::wait() in one of the threads utilises pretty much 100% of the core on which the thread is running. It's almost as if condition::wait() is calling an unrestricted while loop under the hood - clearly something has gone wrong. To mitigate the idea that there is something wrong with the classes that are carry out the threading, if I take them at test them as a block in isolation to the rest of the application, the code behaves perfectly, so I'm pretty convinced it is not something that I have done... or at least, if it is, it really isn't obvious.
So, I thought I'd ask, has anyone else ever had this problem with condition_variables? If so, what was the solution? I'm pretty much out of ideas. It seems to me that it is a bug in boost or perhaps even less likely, the OS (perhaps there is a limit to the number of condition_variables that can be supported and I have exceeded that limit???).
Our target system is an Intel i7 based machine running Fedora 15 (32 bit) with boost 1.47... I guess I could compile the latest version of boost and see if the behaviour is still the same, but past that, I really have no idea what could cause the problem.
Any ideas would be much appreciated.
Last edited by PredicateNormative; February 21st, 2013 at 08:41 AM.
this sounds like a synchronisation issue in the threads. You're probably accessing/modifying a global/shared variable/resource on multiple threads without this being properly protected by a synchronisation object.
There is unfortunately no really easy way out of this other than "oldfashioned" bughunting with a good debugger, a focussed mind and a lot of time.
Sorry about my very delayed response - things got pretty manic at work. Due to time constraints I have had to apply a patch (pretty much re-implemented the offending bit of code in a different, albeit less efficient way), however, I have raised a ticket against the original file, since I don't like plastering over a problem without understanding the cause.
As of yet I have been unable to create a minimal complete app that reproduces the problem - all of the attempts that I have made so far have failed, in that the minimal apps have all worked as expected. This leads me to believe that I might be "barking up the wrong tree" and that the cause of the bug is possibly else where - thankfully they haven't happened too often, but I hate threading bugs!
I tried updating to boost 1.53, but that hasn't made a difference. I'm now doubting that the problem is with boost, but I could be wrong. I have created a number of classes that wrap around boost thread to make threading easier and less error prone, these classes are used everywhere in our code. I would expect that if these threading classes are the problem that I would have seen the issue before now (they've been around for a good while and used in a good many applications), so although I'm not ruling the possibility out, I'm thinking that the threading issue is probably else where.