The problem is that your approach is O(n^2) and it's possible to solve the problem in O(n log(n)). I actually ran a few tests yesterday and found that using std::sort to sort the range first and then remove duplicate trivially is the fastest. Using the STL heap_sort is about 1.5 times slower but has the same growth characteristics.
