Thanks for the reply. I understand what you're saying, and I most certainly cannot argue about what's in the C++ standard, but I'm still hung up about this, from Richard.J's link (http://www.ddj.com/cpp/184403766):

Suppose the compiler figures out that Sleep(1000) is a call into an external library that cannot possibly modify the member variable flag_. Then the compiler concludes that it can cache flag_ in a register and use that register instead of accessing the slower on-board memory. This is an excellent optimization for single-threaded code, but in this case, it harms correctness: after you call Wait for some Gadget object, although another thread calls Wakeup, Wait will loop forever.
It seems to be saying that the Wait function will not see the change to the flag_ variable because it's cached in a register unknown to the Wait function, hence the use of volatile. Is this actually the case, or is there something missing somewhere?