Click to See Complete Forum and Search --> : Find largest integers in array?


getpagesize
February 1st, 2011, 05:41 PM
We have an array A of size n, which contains random positive integers of any size, i.e. not sorted.

What algorithm can find the k largest positive integers in O(n log k) time? Where k is k <= n/2.

I can see a bottom-up heap, which can be constructed in O(n) time, then each removeMax() from the heap would take O(log n) time, and when this is iterated k times it gives: O(k log n).
How do you obtain O(n log k) though? I smell that this has something to do with using one of the well-known sorting algorithms, Quicksort? Mergesort? Heapsort? But which one, and how do you justify that you get O(n log k)?

All of these best sorting algorithms can sort the entire array in O(n log n) time (well, except Quicksort, if you want to be completely formal). But which one can sort k of them in O(n log k) time?

laserlight
February 2nd, 2011, 06:27 AM
I can see a bottom-up heap, which can be constructed in O(n) time, then each removeMax() from the heap would take O(log n) time, and when this is iterated k times it gives: O(k log n).
How do you obtain O(n log k) though?
What if you created a min-heap out of the first k integers instead? Then, you iterate over the remaining integers: whenever the current integer is greater than the minimum in the heap, you remove that minimum and insert the new integer, rebuilding the heap as needed.

All of these best sorting algorithms can sort the entire array in O(n log n) time (well, except Quicksort, if you want to be completely formal).
Quicksort does have O(n log n) complexity in the average case, so it is not that we are not being "completely formal". Speaking of average case, you could use the O(n) algorithm based on quicksort's partitioning scheme(s), then provide it as a solution since the set of algorithms with complexity in O(n) is a subset of the set of algorithms with complexity in O(n log k).

getpagesize
February 2nd, 2011, 09:47 PM
What if you created a min-heap out of the first k integers instead? Then, you iterate over the remaining integers: whenever the current integer is greater than the minimum in the heap, you remove that minimum and insert the new integer, rebuilding the heap as needed.

Seems reasonable, you take the first k integers and create a heap at a cost of O(k) as it is bottom-up.
Then with this heap with k elements you can perform operations in O(log k) time on it.
If all of the remaining n-k integers are greater than the minimum of the heap, then it will cost O((n-k) log k) to add them back in.

Now that you have constructed the heap and removed/added elements as needed, how do you actually extract the largest k elements from the heap? Just to be clear, you are using a min-heap, with the smallest integer in the root node?

laserlight
February 2nd, 2011, 10:30 PM
Now that you have constructed the heap and removed/added elements as needed, how do you actually extract the largest k elements from the heap? Just to be clear, you are using a min-heap, with the smallest integer in the root node?
Think: what data structure would you normally use to store a heap? (I am not referring to a binary tree.) Note that all you want is the largest k elements: the order of these elements don't matter.

Just to be clear, you are using a min-heap, with the smallest integer in the root node?
Yes.