Quote Originally Posted by dommmm View Post
now this should work right? however its possibly the least elegant piece of code ever plus it doesn't deal with a different bit width. any tips on how I could make this better? or make it work if it doesn't already?
There's a massive performance gain lying around in the second for loop. In each iteration of the outer loop you are using (bitWidth - 1) of the same elements for your average. Therefore, keeping a sum of squares and updating this in each loop will reduce the complexity from O(n * bitWidth) to O(n).

I would still advice though, that you first make sure you have the algorithm correct before you begin with such optimizations.