Interlocked* API functions and "memory access semantics"

**Codeplug** · April 23rd, 2010, 01:25 AM

>> Can someone explain why these two purposes are intertwined in these functions?
Win32 critical sections are implemented using "memory barriers". Or in other words, they use instructions which guarantee a deterministic outcome in the face of multiple threads accessing the same memory location at the same time.

Critical sections provide the strong guarantee of sequential consistency - analogous to a full memory fence. Full memory fences are a relatively expensive operation. It can involve flushing cache lines between cores, or completing accesses to/from main memory. An acquire/release fence is typically less expensive, so algorithms which don't require a full fence can benefit by using the less expensive acquire/release fence.

Another aspect is that critical sections may make a kernel call in order to block the thread until the critical section becomes available. The proper use of only Interlocked API's prevents this relatively expensive operation. However, for most use cases, critical sections are just as fast since they already implement optimizations which prevent this kernel call (using Interlocked operations under-the-hood).

>> 2.
At ~24:22 of the video, he mentions the load/store, acquire/release of volatile variables. He also goes on to say "this only works at the compiler level". In other words, it doesn't prevent reordering of reads and writes at the hardware level.

So "at the compiler level" really means the order of the compiler-generated instructions (which the HW may be free to reorder). From a standards perspective, volatile accesses are more like a full fence - because the compiler can't move any volatile load/store instruction before/after another. But again, this doesn't help when multi-threading since the HW can reorder reads and writes.

>> 3.
In general, a full fence is more expensive than an acquire/release fence. Identifying when you need one and not the other is the tricky part. Typically only lock-free or wait-free algorithms are concerned with this level of granularity. A simple exampled would be the double-checked-locking idiom, which is covered here: "C++ and the Perils of Double-Checked Locking" (also a good read for understanding how re-ordering can bite you).

gg

Thread: Interlocked* API functions and "memory access semantics"

Thread Tools

Display

Threaded View

Re: Interlocked* API functions and "memory access semantics"

Posting Permissions