Click to See Complete Forum and Search --> : Application of Multithreading


Kaikyro
September 15th, 2011, 06:08 AM
Hey guys, I'm looking for some high level advice,

I'm heading up development on an existing specialist in-memory database. One of the things we are looking into for improving concurrent performance is making the user requests execute on separate threads, the biggest problem here is (the typical database problem) that they all will only ever access the same set of data objects.

The core objects form a tree format, all containers used do not sort the objects and store pointers to the child objects which in turn store pointers to their children objects - This prevents any kind of object re-ordering in memory which some stl containers do (according to my limited understanding).

I understand the multithreading basics but almost all samples and tutorials use threads to execute isolated tasks which from my understanding will only be partially relevant in that I will create a worker class but I need to focus on the most efficient way to share these complex data structures to readers and writer threads.

How I'm presently planning on approaching this is by adding a semaphore counter on each tree leaf in the data structure so I can lock a tree branch from that point onwards (through subsequent branches) for a write operation (building the branches and data value at the end). But I still need to learn exactly how this would work.

Can anyone provide some high level advice for where I'm at presently in the design stage? I'm looking for a nudge in the right direction and as the whole application has already been developed without forethought for multithreading it will be quite a hard task as I understand it.

Edit: I should add that we use Visual Studios 2008 with the Intel C++ compiler and will be looking at using Intel Threading Building Blocks for their thread-safe containers.

D_Drmmr
September 16th, 2011, 02:37 AM
Can anyone provide some high level advice for where I'm at presently in the design stage? I'm looking for a nudge in the right direction and as the whole application has already been developed without forethought for multithreading it will be quite a hard task as I understand it.

I'd say you're at the beginning. If you want to harness the power of multi-threading, you'll have to come up with a solid design for it. Adding worker threads in an inherently single-threaded design is likely not rewarding.

What you need to do is identify what tasks need to be executed, which tasks are independent (i.e. can be executed in parallel) and which tasks need to be synchronized in some way. I guess the identification of tasks is pretty straightforward in this case: all queries to the database. Which tasks are independent will depend on the structure of the database.

The next step would be to figure out how to synchronize dependent tasks. Possibly you can achieve this within your data structure (e.g. locking nodes in the tree), but it may be better to design and implement a task scheduler. A well designed scheduler is the key to maximizing the scalability of your application, i.e. to maximize the total use of the available processing power. Also, if your scheduler is designed well, it will greatly reduce the need for low-level synchronization using mutexes etc.

Igor Vartanov
September 25th, 2011, 01:20 AM
It's needed to be said, in case the developed database engine is relational one. With multithreading and therefore, multiple access to the same data another problem emerges, sooner or later. It's record/object data consistency, visibility of the changes in transactions, etc. Neither multithreading nor scheduler solves the issue. The simplest approach is locking tables/records/objects, but this diminishes the effect of multithreading while multiple clients intensively operate on the same record/object set.