|
-
February 18th, 2008, 11:25 AM
#1
__declspec( thread )
Hi!
Is the performance for accessing a thread local variable identical to a standard global variable?
Regards / Z
Nobody cares how it works as long as it works
-
February 18th, 2008, 02:57 PM
#2
Re: __declspec( thread )
By thread local variable, do you mean a variable declared within the scope of the thread proc or a TLS variable?
-
February 19th, 2008, 05:47 AM
#3
Re: __declspec( thread )
Well, I'm kind of new to this concept but it is TLS i assume.
I mean a global variable declared like this:
Code:
__declspec(thread) int g_threadnum;
void test() {
if(g_threadnum==0) {
}
if(g_threadnum==1) {
}
}
void threadfunc(int threadnum) {
g_threadnum=threadnum;
test();
}
What I am trying to do is "mark" each thread with a variable so that each thread can instantly know "I am thread number X". These checks will be performed frequently, so I'm looking for the fastest method.
I was previously passing threadnum on into all subroutines of the thread function but after reading about TLS i realize there may be more efficient ways of doing this.
Nobody cares how it works as long as it works
-
February 19th, 2008, 01:22 PM
#4
Re: __declspec( thread )
Could we put aside speed considerations for a moment and consider thread safety?
The code you posted is not thread safe because there could be a conflict when two threads access the same global (if you are running two thread simultaneously).
Can you give an idea of why you need to know what thread you are in? I ask because there might be a way to structure the code that is threadsafe without the need to track which thread you are in.
-
February 20th, 2008, 04:38 AM
#5
Re: __declspec( thread )
 Originally Posted by Arjay
The code you posted is not thread safe because there could be a conflict when two threads access the same global (if you are running two thread simultaneously).
Thanks for your reply but I am confused.
You mean that the __declspec(thread) in front of the global variable has no effect at all?
That does not make sense because when I run this code, my printf debugging tells me that g_threadnum has a different value, depending on which thread reads the value.
I need the thread number for many purposes. Threads store their results in arrays that are indexed using the thread number. Threads use arrays of synchronization objects that are also indexed by thread number.
E.g.
Code:
myLocks[g_threadnum].Lock();
Nobody cares how it works as long as it works
-
February 20th, 2008, 08:00 AM
#6
Re: __declspec( thread )
 Originally Posted by zerver
Thanks for your reply but I am confused.
You mean that the __declspec(thread) in front of the global variable has no effect at all?
That does not make sense because when I run this code, my printf debugging tells me that g_threadnum has a different value, depending on which thread reads the value.
I need the thread number for many purposes. Threads store their results in arrays that are indexed using the thread number. Threads use arrays of synchronization objects that are also indexed by thread number.
E.g.
Code:
myLocks[g_threadnum].Lock();
You are correct in terms of __declspec(thread). I misread that as a dll export.
This is my preference, but I prefer to take more of an OO approach to solving a problem like this and not have global variables or globals arrays being accessed by multiple threads.
I don't know if you are performing synchronizating tasks in the thread itself, but my preference is to create a class that provides thread safe wrappers, and to pass a pointer to this class to the thread during thread creation. Then the thread just calls the accessor methods and doesn't need to worry about synchronization. This class pointer can be passed directly during thread creation or be a member of another class or struct. For example, you may have a struct that contains the pointer to the class, and an ID that is used to identify the thread. Inside the thread proc, the ID is passed to the accessor functions so that the data for that thread can be accessed.
Other approaches are to create class with it's own static thread proc and containing its own data. When a class instance is created, the thread is started (passing in the this pointer) and the thread works on the data within the class instance. This approach works only for situations where there are relatively few threads (< 25).
For a greater number of threads, you can leverage QueueUserWorkItem() api. Here you create a class that contains a static thread proc, but instead of creating a thread for each class instance, you pass the thread sproc to QueueUserWorkItem() and the system will create and maintain a thread pool for you.
These are just different approaches for solving the same type of problem.
-
February 20th, 2008, 10:45 AM
#7
Re: __declspec( thread )
Thank you. Well, global variables is the only way for me. What I'm doing is OpenGL multithreading.
If we return to my original question regarding performance of accessing the TLS variables:
If it is a "real" variable (and it seems so) the system must dynamically allocate a new code segment for each thread that wants to execute a function that uses the TLS variable. It must then perform some kind of relocation so that each thread in fact uses a different variable.
Am I right in my thinking?
If so, the memory usage would increase, but performance would not suffer, unless the system is "stupid" and performs allocation/relocation each time the function is executed...
Nobody cares how it works as long as it works
-
February 20th, 2008, 11:14 AM
#8
Re: __declspec( thread )
 Originally Posted by zerver
Thank you. Well, global variables is the only way for me. What I'm doing is OpenGL multithreading.
Is OpenGL multi-threading API different from win32 or MFC threading APIs? What MT APIs are you using?
 Originally Posted by zerver
If it is a "real" variable (and it seems so) the system must dynamically allocate a new code segment for each thread that wants to execute a function that uses the TLS variable. It must then perform some kind of relocation so that each thread in fact uses a different variable.
Am I right in my thinking?
Yes, you are right in your thinking but you cannot precisely know what happens behind the scenes and probably not worth it. The underlying data structure could be anything - a map, a hash map, or something completely different.
 Originally Posted by zerver
If so, the memory usage would increase, but performance would not suffer, unless the system is "stupid" and performs allocation/relocation each time the function is executed...
Not everytime a function is executed. If you declared the thread local variable as __declspec(thread) declarator, the variable would be created once any thread is created and probably destroyed as soon as the variable gets the destroyed. And that should not happen each time any function in a thread gets called.
What I am not understanding is what you are calling a global variable? A simple standard global variable is not sufficient to hold a number for all threads because it is just one variable and any read or write should be synchronized. Synchronization is not necessary because it doesn't fit your purpose in the first place. Is that what you calling a global variable? A thread local variable on TLS is actually a global variable but it is thread specific. There are as many instances of that same name as there are the threads and you can use them without any interferences among threads unless you actually share a pointer to that variable with another thread.
By the way, with pthreads, there is a function called pthread_self() which gives back the opaque thread ID of the calling thread. That is the way to actually identify threads. I suppose there should be some similar way to identify your threads rather than having to create a TLS variable. With win32 threading APIs, you have GetCurrentThreadId() function. That should suffice in case you are doing win32 threads. Else you will have to tell what APIs are you using.
Last edited by exterminator; February 20th, 2008 at 11:17 AM.
Can you help me with my homework assignment?, Before you post!, Use code tags, How to post!, Codeguru technical FAQs, C++ FAQ Lite, Stroustrup: C++ Style and Technique FAQ, Guru of the Week, Comeau C and C++ FAQs, Comeau C++ Templates FAQs, CUJ @ DDJ, Spam threshold
My Blogs : Learning C++ is fun | Abnegator's reflections
Open Threads : C++ Aha! Moments | Nature of work in C++?
-
February 20th, 2008, 11:22 AM
#9
Re: __declspec( thread )
I am not sure if I have answered your question and hence you would have to explain a bit more. Is it a thread pool that you are planning to make? Why do you need to have code that does something different depending upon different thread IDs?
Can you help me with my homework assignment?, Before you post!, Use code tags, How to post!, Codeguru technical FAQs, C++ FAQ Lite, Stroustrup: C++ Style and Technique FAQ, Guru of the Week, Comeau C and C++ FAQs, Comeau C++ Templates FAQs, CUJ @ DDJ, Spam threshold
My Blogs : Learning C++ is fun | Abnegator's reflections
Open Threads : C++ Aha! Moments | Nature of work in C++?
-
February 20th, 2008, 11:29 AM
#10
Re: __declspec( thread )
Thanks. Yeah, what I meant was "global TLS variable".
Hash map is unlikely because you can actually take the address of a TLS variable and use it, as long as you are inside a specific thread.
Looks like I'll have to do some testing to see if reading/writing TLS is slower than a standard global variable.
Nobody cares how it works as long as it works
-
February 20th, 2008, 11:58 AM
#11
Re: __declspec( thread )
TLS (thread local storage) IS what is used when you use _declspec(thread) . Be aware that this is a Microsoft extension, and will not port to other compilers or platforms. Also the items must be simple POD (not classes!)
The actual access is done by a set of helper routines
There are rstrictions on access, including taking the address of a TLS variable
In almost all cases, accessing a TLS variable will be slower than accessing a "true" global or static vaiable. And is typically equivilant to a stack or heap access.
It is critical to remember that performance is always a secondary concern to reliability and maintainability. Only if it can be determined that a specific portion of code is having a measurable impact [via the use of some type of profiling tool] that performance should become the focus.
This is true for EVERY type of application, ranging from Games to Missile Guidance [it is no good to shoot at there the target was ].
I am also VERY skeptical of ANY design that has any of the following charastics:
1) Dependancies on static or global data.
2) Requires Knowledge of system context (e.g. thread, process, processor)
3) Heavily depends on proprietary extensions (they can usually be encapsulated)
The "real world" issues that will usually impact an application are most usually issues that arise from outside the application itself. A dramatic example was SQLServer-2005 vs. SQLServer-2000. All of the "benchmarks" indicated a general improvement in performance (some cases dramatic, others minimal). But when deployed to one (very large) client base, the results were that SQL2005 was orders of magnitude SLOWER than SQL2000. It turned out that there was one specific model Pentium (which this company had used in over 5,000 systems) which had a slight difference in the L1/L2 caching.
The code in SQL2005 had grown by a few bytes for a key algorithm deep in the heart of SQL Server. For all other processors the code would either: Not be held in the cache (older smaller slower processors) or Would be held in the cache (all recent processors) for both the 2000 and 2005 versions.
But for the one specific processor, the SQL2000 implementation DID fit in the cache, but the SQL2005 version did NOT fit. This resulted in faults all the way back to main RAM (and potentiall the disk resident page file).
TheCPUWizard is a registered trademark, all rights reserved. (If this post was helpful, please RATE it!)
2008, 2009,2010
In theory, there is no difference between theory and practice; in practice there is.
* Join the fight, refuse to respond to posts that contain code outside of [code] ... [/code] tags. See here for instructions 
* How NOT to post a question here
* Of course you read this carefully before you posted
* Need homework help? Read this first
-
February 20th, 2008, 12:08 PM
#12
Re: __declspec( thread )
"Win32 and the Visual C++ compiler now support statically bound (load-time) per-thread data in addition to the existing API implementation."
Doesn't this mean that it also supports "real" TLS variables?
 Originally Posted by exterminator
I am not sure if I have answered your question and hence you would have to explain a bit more. Is it a thread pool that you are planning to make? Why do you need to have code that does something different depending upon different thread IDs?
You could call it a thread pool.
I already said this, but it uses the thread number to index arrays.
Code:
results[g_threadnum]=...;
locks[g_threadnum].Lock();
So, if g_threadnum is not a "real" variable (a disguised function call) it will be too slow for my requirements.
I may choose to put all the thread data in a thread class to reduce the frequent array indexing, but the access to g_threadnum must still be fast.
Nobody cares how it works as long as it works
-
February 20th, 2008, 12:49 PM
#13
Re: __declspec( thread )
 Originally Posted by zerver
"Win32 and the Visual C++ compiler now support statically bound (load-time) per-thread data in addition to the existing API implementation."
Doesn't this mean that it also supports "real" TLS variables?
I would be interested in seeing the source of that reference. It is impossible to allocate and reference the space at process load time. There is no way of knowing if there are going to be 1 or 10000 threads running at a given time. The last time I actually looked at the disassembly the size of a TLS block was known, and dynamically allocated at thread creation, then all access was via an indexed offset.
You could call it a thread pool.
It either IS a thread pool, or it is not. I can call a wild lion a kitty....
I already said this, but it uses the thread number to index arrays.
Code:
results[g_threadnum]=...;
locks[g_threadnum].Lock();
So, if g_threadnum is not a "real" variable (a disguised function call) it will be too slow for my requirements.
I may choose to put all the thread data in a thread class to reduce the frequent array indexing, but the access to g_threadnum must still be fast.
1) Again I question WHY the need. Your posted sample makes NO sense. If there is going to be one lock per thread, then the lock will NEVER get invoked!
2) Something being implenented as a function is not a performance issue per se. The function could be inlined, and then optimized with the surrounding code.
3) "It will be too slow for my requirements". Exactly how many nano or micro seconds must the specific operation complete in? How will you handle system interruptions (e.g. interupts, task switches)? How will you deal with hard page faults? etc....
TheCPUWizard is a registered trademark, all rights reserved. (If this post was helpful, please RATE it!)
2008, 2009,2010
In theory, there is no difference between theory and practice; in practice there is.
* Join the fight, refuse to respond to posts that contain code outside of [code] ... [/code] tags. See here for instructions 
* How NOT to post a question here
* Of course you read this carefully before you posted
* Need homework help? Read this first
-
February 20th, 2008, 04:19 PM
#14
Re: __declspec( thread )
Whenever possible, in multithreading you want to control the access of shared data in an encapsulated way.
Code such as this can be extremely problematic in an multithreaded environment.
Code:
UINT WINAPI ThreadProc( void )
{
results[g_threadnum]=...;
locks[g_threadnum].Lock();
// Do work here
locks[g-threadnum].UnLock(); // I assume you need to unlock here
}
It's problemmatic because you allow external control of the locking and unlocking and this makes it difficult to track down synchronization issues.
Now consider the design I was referring to earlier. In this design, during thread creation the a struct is used to pass in a thread index and a pointer to a class that provides thread safe access to shared data.
Code:
UINT WINAPI ThreadProc( LPVOID lpVoid )
{
LPTHREADPARAMS lpThreadParams = (LPTHREADPARAMS)lpVoid;
CDataManager* pDataManager = lpThreadParams.m_pDataManager;
UINT uThreadIndex = lpThreadParams.m_pThreadIndex;
pDataManager->DoSomeThreadSafeWork( uThreadIndex, SomeThreadParam );
SomeValue = pDateManager->GetSomeThreadSafeValue( uThreadIndex );
}
Notice in the above snippet that the thread doesn't need to worry about synchronization as the sync code is encapsulated within the CDataManager class.
Many of the perceived threading issues can be reduced following an approach such as this. IMO, developers get into trouble when they don't compartmentalize the synchronization chores within a controlling class.
-
February 20th, 2008, 10:39 PM
#15
Re: __declspec( thread )
 Originally Posted by Arjay
IMO, developers get into trouble when they don't compartmentalize the synchronization chores within a controlling class.
(see point #2 in reply #11!!!!!)
TheCPUWizard is a registered trademark, all rights reserved. (If this post was helpful, please RATE it!)
2008, 2009,2010
In theory, there is no difference between theory and practice; in practice there is.
* Join the fight, refuse to respond to posts that contain code outside of [code] ... [/code] tags. See here for instructions 
* How NOT to post a question here
* Of course you read this carefully before you posted
* Need homework help? Read this first
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|