Quote Originally Posted by zerver
"Win32 and the Visual C++ compiler now support statically bound (load-time) per-thread data in addition to the existing API implementation."

Doesn't this mean that it also supports "real" TLS variables?
I would be interested in seeing the source of that reference. It is impossible to allocate and reference the space at process load time. There is no way of knowing if there are going to be 1 or 10000 threads running at a given time. The last time I actually looked at the disassembly the size of a TLS block was known, and dynamically allocated at thread creation, then all access was via an indexed offset.

You could call it a thread pool.
It either IS a thread pool, or it is not. I can call a wild lion a kitty....

I already said this, but it uses the thread number to index arrays.
Code:
results[g_threadnum]=...;

locks[g_threadnum].Lock();
So, if g_threadnum is not a "real" variable (a disguised function call) it will be too slow for my requirements.

I may choose to put all the thread data in a thread class to reduce the frequent array indexing, but the access to g_threadnum must still be fast.
1) Again I question WHY the need. Your posted sample makes NO sense. If there is going to be one lock per thread, then the lock will NEVER get invoked!

2) Something being implenented as a function is not a performance issue per se. The function could be inlined, and then optimized with the surrounding code.

3) "It will be too slow for my requirements". Exactly how many nano or micro seconds must the specific operation complete in? How will you handle system interruptions (e.g. interupts, task switches)? How will you deal with hard page faults? etc....