Great topic for a special thread.

My name is Chris. I have been developing all kinds of parallel applications for all kinds of microprocessor systems as well as high-performance parallel algorithms for many years.

There are several technologies available for implementing parallel processing in C++. It is possible to create parallel threads, and the portability thereof should improve drastically with the advent of threads specified in C++0x. It is also possible to use OpenMP for selected, specially prepared code sequences. And it is even possible to activate Intel's parallelizing options which automatically seek out sequences which possibly benefit from parallelization, which are subsequently automatically parallelized by the compiler.

So finally a few general questions for Clay and Aaron:

Do you have any guidelines for using different implementation methods for parallelization in specific situations?

Are there any clear cases for which one form of parallelization, such as creating dedicated parallel threads, clearly outperforms other methods?

Are there any methods available with which developers can judge the parallelization overhead, such as the overhead caused by creation of dedicated threads or event catching, and relate this overhead to the expected benefit in order to better select the right parallelization technology?

Sincerely, Chris.