Hi!
Is the performance for accessing a thread local variable identical to a standard global variable?
Regards / Z
Printable View
Hi!
Is the performance for accessing a thread local variable identical to a standard global variable?
Regards / Z
By thread local variable, do you mean a variable declared within the scope of the thread proc or a TLS variable?
Well, I'm kind of new to this concept but it is TLS i assume.
I mean a global variable declared like this:
What I am trying to do is "mark" each thread with a variable so that each thread can instantly know "I am thread number X". These checks will be performed frequently, so I'm looking for the fastest method.Code:__declspec(thread) int g_threadnum;
void test() {
if(g_threadnum==0) {
}
if(g_threadnum==1) {
}
}
void threadfunc(int threadnum) {
g_threadnum=threadnum;
test();
}
I was previously passing threadnum on into all subroutines of the thread function but after reading about TLS i realize there may be more efficient ways of doing this.
Could we put aside speed considerations for a moment and consider thread safety?
The code you posted is not thread safe because there could be a conflict when two threads access the same global (if you are running two thread simultaneously).
Can you give an idea of why you need to know what thread you are in? I ask because there might be a way to structure the code that is threadsafe without the need to track which thread you are in.
Thanks for your reply but I am confused.Quote:
Originally Posted by Arjay
You mean that the __declspec(thread) in front of the global variable has no effect at all?
That does not make sense because when I run this code, my printf debugging tells me that g_threadnum has a different value, depending on which thread reads the value.
I need the thread number for many purposes. Threads store their results in arrays that are indexed using the thread number. Threads use arrays of synchronization objects that are also indexed by thread number.
E.g.Code:myLocks[g_threadnum].Lock();
You are correct in terms of __declspec(thread). I misread that as a dll export.Quote:
Originally Posted by zerver
This is my preference, but I prefer to take more of an OO approach to solving a problem like this and not have global variables or globals arrays being accessed by multiple threads.
I don't know if you are performing synchronizating tasks in the thread itself, but my preference is to create a class that provides thread safe wrappers, and to pass a pointer to this class to the thread during thread creation. Then the thread just calls the accessor methods and doesn't need to worry about synchronization. This class pointer can be passed directly during thread creation or be a member of another class or struct. For example, you may have a struct that contains the pointer to the class, and an ID that is used to identify the thread. Inside the thread proc, the ID is passed to the accessor functions so that the data for that thread can be accessed.
Other approaches are to create class with it's own static thread proc and containing its own data. When a class instance is created, the thread is started (passing in the this pointer) and the thread works on the data within the class instance. This approach works only for situations where there are relatively few threads (< 25).
For a greater number of threads, you can leverage QueueUserWorkItem() api. Here you create a class that contains a static thread proc, but instead of creating a thread for each class instance, you pass the thread sproc to QueueUserWorkItem() and the system will create and maintain a thread pool for you.
These are just different approaches for solving the same type of problem.
Thank you. Well, global variables is the only way for me. What I'm doing is OpenGL multithreading.
If we return to my original question regarding performance of accessing the TLS variables:
If it is a "real" variable (and it seems so) the system must dynamically allocate a new code segment for each thread that wants to execute a function that uses the TLS variable. It must then perform some kind of relocation so that each thread in fact uses a different variable.
Am I right in my thinking?
If so, the memory usage would increase, but performance would not suffer, unless the system is "stupid" and performs allocation/relocation each time the function is executed...
Is OpenGL multi-threading API different from win32 or MFC threading APIs? What MT APIs are you using?Quote:
Originally Posted by zerver
Yes, you are right in your thinking but you cannot precisely know what happens behind the scenes and probably not worth it. The underlying data structure could be anything - a map, a hash map, or something completely different.Quote:
Originally Posted by zerver
Not everytime a function is executed. If you declared the thread local variable as __declspec(thread) declarator, the variable would be created once any thread is created and probably destroyed as soon as the variable gets the destroyed. And that should not happen each time any function in a thread gets called.Quote:
Originally Posted by zerver
What I am not understanding is what you are calling a global variable? A simple standard global variable is not sufficient to hold a number for all threads because it is just one variable and any read or write should be synchronized. Synchronization is not necessary because it doesn't fit your purpose in the first place. Is that what you calling a global variable? A thread local variable on TLS is actually a global variable but it is thread specific. There are as many instances of that same name as there are the threads and you can use them without any interferences among threads unless you actually share a pointer to that variable with another thread.
By the way, with pthreads, there is a function called pthread_self() which gives back the opaque thread ID of the calling thread. That is the way to actually identify threads. I suppose there should be some similar way to identify your threads rather than having to create a TLS variable. With win32 threading APIs, you have GetCurrentThreadId() function. That should suffice in case you are doing win32 threads. Else you will have to tell what APIs are you using.
I am not sure if I have answered your question and hence you would have to explain a bit more. Is it a thread pool that you are planning to make? Why do you need to have code that does something different depending upon different thread IDs?
Thanks. Yeah, what I meant was "global TLS variable".
Hash map is unlikely because you can actually take the address of a TLS variable and use it, as long as you are inside a specific thread.
Looks like I'll have to do some testing to see if reading/writing TLS is slower than a standard global variable.
TLS (thread local storage) IS what is used when you use _declspec(thread) . Be aware that this is a Microsoft extension, and will not port to other compilers or platforms. Also the items must be simple POD (not classes!)
The actual access is done by a set of helper routines
There are rstrictions on access, including taking the address of a TLS variable
In almost all cases, accessing a TLS variable will be slower than accessing a "true" global or static vaiable. And is typically equivilant to a stack or heap access.
It is critical to remember that performance is always a secondary concern to reliability and maintainability. Only if it can be determined that a specific portion of code is having a measurable impact [via the use of some type of profiling tool] that performance should become the focus.
This is true for EVERY type of application, ranging from Games to Missile Guidance [it is no good to shoot at there the target was ;) ].
I am also VERY skeptical of ANY design that has any of the following charastics:
1) Dependancies on static or global data.
2) Requires Knowledge of system context (e.g. thread, process, processor)
3) Heavily depends on proprietary extensions (they can usually be encapsulated)
The "real world" issues that will usually impact an application are most usually issues that arise from outside the application itself. A dramatic example was SQLServer-2005 vs. SQLServer-2000. All of the "benchmarks" indicated a general improvement in performance (some cases dramatic, others minimal). But when deployed to one (very large) client base, the results were that SQL2005 was orders of magnitude SLOWER than SQL2000. It turned out that there was one specific model Pentium (which this company had used in over 5,000 systems) which had a slight difference in the L1/L2 caching.
The code in SQL2005 had grown by a few bytes for a key algorithm deep in the heart of SQL Server. For all other processors the code would either: Not be held in the cache (older smaller slower processors) or Would be held in the cache (all recent processors) for both the 2000 and 2005 versions.
But for the one specific processor, the SQL2000 implementation DID fit in the cache, but the SQL2005 version did NOT fit. This resulted in faults all the way back to main RAM (and potentiall the disk resident page file). :eek: :eek: :eek: :eek:
"Win32 and the Visual C++ compiler now support statically bound (load-time) per-thread data in addition to the existing API implementation."
Doesn't this mean that it also supports "real" TLS variables?
You could call it a thread pool.Quote:
Originally Posted by exterminator
I already said this, but it uses the thread number to index arrays.
So, if g_threadnum is not a "real" variable (a disguised function call) it will be too slow for my requirements.Code:results[g_threadnum]=...;
locks[g_threadnum].Lock();
I may choose to put all the thread data in a thread class to reduce the frequent array indexing, but the access to g_threadnum must still be fast.
I would be interested in seeing the source of that reference. It is impossible to allocate and reference the space at process load time. There is no way of knowing if there are going to be 1 or 10000 threads running at a given time. The last time I actually looked at the disassembly the size of a TLS block was known, and dynamically allocated at thread creation, then all access was via an indexed offset.Quote:
Originally Posted by zerver
It either IS a thread pool, or it is not. I can call a wild lion a kitty....Quote:
You could call it a thread pool.
1) Again I question WHY the need. Your posted sample makes NO sense. If there is going to be one lock per thread, then the lock will NEVER get invoked!Quote:
I already said this, but it uses the thread number to index arrays.
So, if g_threadnum is not a "real" variable (a disguised function call) it will be too slow for my requirements.Code:results[g_threadnum]=...;
locks[g_threadnum].Lock();
I may choose to put all the thread data in a thread class to reduce the frequent array indexing, but the access to g_threadnum must still be fast.
2) Something being implenented as a function is not a performance issue per se. The function could be inlined, and then optimized with the surrounding code.
3) "It will be too slow for my requirements". Exactly how many nano or micro seconds must the specific operation complete in? How will you handle system interruptions (e.g. interupts, task switches)? How will you deal with hard page faults? etc....
Whenever possible, in multithreading you want to control the access of shared data in an encapsulated way.
Code such as this can be extremely problematic in an multithreaded environment.
It's problemmatic because you allow external control of the locking and unlocking and this makes it difficult to track down synchronization issues.Code:UINT WINAPI ThreadProc( void )
{
results[g_threadnum]=...;
locks[g_threadnum].Lock();
// Do work here
locks[g-threadnum].UnLock(); // I assume you need to unlock here
}
Now consider the design I was referring to earlier. In this design, during thread creation the a struct is used to pass in a thread index and a pointer to a class that provides thread safe access to shared data.
Notice in the above snippet that the thread doesn't need to worry about synchronization as the sync code is encapsulated within the CDataManager class.Code:UINT WINAPI ThreadProc( LPVOID lpVoid )
{
LPTHREADPARAMS lpThreadParams = (LPTHREADPARAMS)lpVoid;
CDataManager* pDataManager = lpThreadParams.m_pDataManager;
UINT uThreadIndex = lpThreadParams.m_pThreadIndex;
pDataManager->DoSomeThreadSafeWork( uThreadIndex, SomeThreadParam );
SomeValue = pDateManager->GetSomeThreadSafeValue( uThreadIndex );
}
Many of the perceived threading issues can be reduced following an approach such as this. IMO, developers get into trouble when they don't compartmentalize the synchronization chores within a controlling class.
:thumb: (see point #2 in reply #11!!!!!) :DQuote:
Originally Posted by Arjay