Okay, multithreading 101: threading can be done either in user mode or kernel mode, or some combination thereof. The advantage of user mode threading is that you don't have expensive kernel-mode function calls to worry about. The advantage of kernel mode threading is that a blocking kernel call can be made to block only one thread rather than the entire process.
What I'm not entirely clear on is to what degree each of the various threading packages uses each paradigm, and what optimization options are available when, for instance, you know that user-mode synchronization is good enough and you don't want to pay for a kernel call.
First of all, user mode "threads" (fibers) are not actually threads in the sense that they do not allow one to utilize parallel hardware. They are useful only as a replacement for explicit state machines and continuations. They are no more than "a syntactic sugar" in the context of parallel hardware.
Then, kernel mode threads do use user-level synchronization w/o kernel calls. pthread_mutex_t, CRITICAL_SECTION are all fast-pathed in user space, not saying about __sync_XXX/_InterlocedXXX functions. Kernel threads do not imply kernel synchronization.
Moreover, provided cooperative user mode scheduling blocked fiber does not have to block the entire underlying thread. Take a look at a Windows 7 UMS (user mode scheduling).
As a bottom line. If you need to utilize parallel hardware you must use kernel threads. User threads is a mean for program structuring and have nothing to do with concurrency.
There is also quite popular nowadays abstraction of tasks, which are basically lightweight user threads. Modern libraries usually use kernel threads for concurrency, and tasks (which run on top of kernel threads) for program structuring.