>> What platform will this be running on? What I/O is this memory mapped to?
Your solution depends on the answer to these questions. If this is an embedded device or driver with memory that has been mapped to physical controllers/registers that have the same bit layout as GENERIC_COMMAND/COMPLETION_WAIT - then you'll need to use volatile on pointers that point into that special mapped region. You will also need to ensure that your structures have the required packing and alignment so that accesses to mapped I/O occur as intended.

If that isn't the case, then you don't need volatile anywhere.
So in the end, what will this code be running on? An embedded device? As a device driver on a PC? As a user-mode application on a PC?

>> line:82: generic = command;
You need a non-volatile operator=() since the left-hand-side is non-volatile.
Code:
template <typename T>
GENERIC_COMMAND& operator=(const T& rhs) {...}
gg