Okay, multithreading 101: threading can be done either in user mode or kernel mode, or some combination thereof. The advantage of user mode threading is that you don't have expensive kernel-mode function calls to worry about. The advantage of kernel mode threading is that a blocking kernel call can be made to block only one thread rather than the entire process.

What I'm not entirely clear on is to what degree each of the various threading packages uses each paradigm, and what optimization options are available when, for instance, you know that user-mode synchronization is good enough and you don't want to pay for a kernel call.

Anyone have any insight in this regard?