I watched with interest 'Parallel Programming Talk #66 - Listener Question "What is “acquire memory access semantics” and do I need to worry about this in parallel programming?"' which features Mr. Tersteeg and Dr. Breshears.

http://software.intel.com/en-us/blog...l-programming/

The Talk is very informative and helped me learn quite a bit. I ended up with four additional questions after pondering for a while.

(I hope it is OK to ask about this video here on a different website. That is, I hope I'm not violating some internet etiquette of which I am unaware.)

1. I did not know the Interlocked* API functions effected a fence as described in the video. I have used them quite a bit in multithreading as a faster alternative to a critical section for simple changes to variables, and assumed that the operating system was responsible for ensuring only one thread was inside the call for any given destination variable. Thus, it seems confusing to me that these API functions are seemingly accomplishing two not-necessarily-related purposes. That is, on the one hand they are a quick critical section (so I suppose), and on the other they are a fence to the processor via a special machine instruction they use internally. Can someone explain why these two purposes are intertwined in these functions?

2. Dr. Breshears states loading from a volatilve variable effects an acquire fence and storing into a volatile effects a release fence (or maybe I've got that backwards). Can someone expound on this? I don't see why this would be the case.

3. I think I generaly get the "[acquire|release] memory access semantics" in terms of their asymmetric fence properties. Can someone give a simple example of how this asymmetry can be used to acheive something more efficient than a full fence?

4. Is it only the Itanium that supports the acquire/release versions of these functions, or are they available on Xeons and i7s?

Thanks to anyone for information, and to Mr. Tersteeg and Dr. Breshears for the informative Talk video!
GeoRanger