Click to See Complete Forum and Search --> : Hidden synchronization...?


bp_1986
June 22nd, 2011, 11:36 AM
Hi everybody. I'm developing a multithreaded program in SIMD style, so I'm partitioning an array of, say, N elements in N_thread pieces, each piece made of thread_dim elements. Here is a simplified version of a thread code:


void my_thread( int* array, const int* shared_data, int N, int thread_dim, int thread_id)
{
// ...
for(int i=0; i<thread_dim; ++i)
{
int index = thread_id * thread_dim + i;
if( index < N)
array[index] = shared_data[0] + index;
}
// ...
}

thread_id is 0,1,2,...,N_thread-1 so thread_id*thread_dim is an offset that makes impossible race conditions between threads, so threads share the same pointer but they never access the same position. My question is: is there some kind of hidden synchronization added by OS in pointer accesses, also if threads are accessing different - maybe contiguous - elements? Writing const int* should avoid synchronization problems (if any) when threads access the same position but just to read from?
In addition, I'm compiling everything on Windows XP, Visual Studio 2008 and I'll do the same on Linux, I use Boost Threads.
Thanks to everyone would be so nice to answer!

Biagio

Codeplug
June 22nd, 2011, 12:37 PM
>> is there some kind of hidden synchronization added by OS in pointer accesses
No.

If multiple threads are only reading from the same memory location, and that memory hasn't changed since the threads started, then no synchronization is needed.

>> threads are accessing different - maybe contiguous - elements
Contiguous or not, as long as all threads are only reading then synchronization is not needed.

If threads do need to write (using synchronization), the only thing you need to be aware of with contiguous memory is "word tearing". More specifically, you are in danger of having word-tearing if two threads write to adjacent memory locations, and both of those locations fit within the natural memory granularity of the processor. Since you're accessing contiguous int's through an int pointer, you shouldn't have any problems on an x86 based processor unless the int pointer is totally unaligned (which is shouldn't be). I can't speak for 64bit, non-x86 processor's though, eg. http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V50_HTML/ARH9RATE/DOCU_007.HTM#gran_sec

gg

bp_1986
June 22nd, 2011, 01:34 PM
Thank you, Codeplug. I had some doubts because executing some multithreaded programs on a quadcore and looking at the average execution time of each program, I found that too many times processing 1000000 elements with a single thread was faster than processing them with 2,3 or 4 threads, so I thought there was some kind of hidden synchronization degradating performances. Again, thanks a lot.

Biagio

Codeplug
June 22nd, 2011, 02:48 PM
"False sharing" can affect performance - which is CPU cache synchronization. You could inspect your code for possible false sharing.

http://developer.amd.com/Membership/Print.aspx?ArticleID=32&web=http%3a%2f%2fdeveloper.amd.com
http://drdobbs.com/go-parallel/article/showArticle.jhtml?articleID=217500206
http://software.intel.com/en-us/blogs/2008/10/09/eliminate-false-sharing-wrong/

gg

bp_1986
June 23rd, 2011, 05:31 AM
Thank you, Codeplug. It's exactly what I was looking for! In particular Dr Dobb's article is to me really useful. I'll correct my code and check if something changes. However, thanks again and goodbye.

Biagio