I was reading the Kode Vicious column in the April 2010 issue of Communications of the ACM. He was discussing data structures as the canon of computer science, like elementary equations are the canon of physics. The four data structures listed were the array, list, tree, and hash table.
I know that there are concurrent and thread-safe implementations of these structures and variations on them (e.g. vectors and hash maps in TBB). One variation on the simple tree structure are self-balancing trees that maintain a search depth as items are added or removed from the tree. I've always thought that a thread-safe red-black or AVL tree structure might be hard to implement, without locking the entire structure for rebalancing.
Has anyone used or implemented a thread-safe self-balancing tree? Is there a key idea to not needing to lock the entire tree when nodes are being shifted around? Or is this kind of structure too complex to do in an efficient method?
I've never implemented, or used, a self-balancing tree that employs techniques which allow simultaneous readers and writers.
I've read literature on the subject from here: http://www.cl.cam.ac.uk/research/srg/netos/lock-free/ (some code too)
Keir Fraser's dissertation, "Practical lock freedom", covers both lock-based and STM-based techniques for red-black trees. Then in "Concurrent programming without locks" the performance of various red-black implementations are analyzed/compared.
One interesting point in those papers is that skip-lists provide similar run-time guarantees, but are much easier (relatively speaking) to parallelize.
I personally try to follow the principles of K.I.S.S. in most designs - which pretty much excludes the use of most lock-free/wait-free algorithms
There is a technique called PCOW (partial copy-on-write), it works for any kind of recursive data structures (lists, trees). It provides serializable operations and zero-overhead 100% scalable wait-free read accesses. Write accesses are lock-free. The technique requires some form of PDR (partial copy-on-write deferred reclamation, limited form of GC) to work.
Readers operate exactly the same as in single-threaded variant, except that loads of node pointers must be done with memory_order_consume ordering (which is basically costless on all modern hardware).
All the burden is on writers (real programs frequently experience very high read/write ratios, and writes to shared data do not scale anyway so who cares?). If a writer wants to modify a set of nodes (rotate a part of a tree), he finds a serialization point (common parent of all to be modified nodes), then build a complete replacement for the affected part of a tree (everything below serialization point) (note that the replacement can re-use existent nodes), then atomically update serialization point (atomically update a pointer to the sub-tree with a CAS), then defer deletion of the old sub-tree. That's it.