I'd start by taking another overall look at your design. While 100 threads may not be that bad, 2000 threads is excessive (unless you are running on a heavy multi-proc/multi-core machine). You may look into using a thread pool mechanism.

For the aggregation, consider having each thread average it's data, then use a shared queue to send the data from each thread to a 'aggregation' thread. The 'aggregation' thread just pulls the data out of the queue and aggregates it at a specified interval.