I do a lot of work with high-performance mathematecal algorithm design. Some of my work has significantly benefited from parallelization.

I have, at times,
- manually created dedicated threads for parallel matrix calculations
- manually created dedicated threads for parallel FFT operations
- manually separated chunks of large-number algorithms (those dealing with hundreds or thousands of digits or more) into parallelizable parts.

All of this work has involved tedious manual separation of algorithm chunks, and it has been limited to operations which are easily identified as parallelizable.

Are there any better ways? Are there methods to create automation mechanisms for parallelization?

Consider a big number class which performs many, many mathematical operations in a linear, sequential fashion, each operation --- particularly big-number multiplication --- being large-scale enough in scope to benefit from parallelization. Some operations like (a * b) + (c * d) can be intuitively parallelized. On the other hand, many algorithms such as series expansions have sums of terms for which the value of the n'th term depends on the value of the [n-1]'th term. These kinds of operations seem to be sequential --- that is unless the number of expansion terms is known in advance and the algorithm sets up something like a parallelized "for_each" mechanism.

Expanding on these ideas, I always think of parsers like EBNF and parser-generators like flex and bison. They parse syntaxes and place operations on a sequential operation stack and subsequently carry them out. Are there any technologies available for placing operations onto a parallelized operation stack? This seems quite difficult to me, almost like a research project (and I believe there is research being carried out in this field). The mechanism would have to determe which operations can be done in parallel, it must synchronize those with previous, pending sequential results, and wrap all synchronized and sequential results into a single answer at the end. It always sort of blows my mind.

Is there any kind of technology or research in this area? Does anyone know of research results or other literature in the area of automated parallelization of operation stacks? Is there any similar technology?

Sincerely, Chris.