find the median in a fix-sized moving window on a long sequency of data
Given a sequence of data (it may have duplicates), a fixed-sized moving window, move the window at each iteration from the start of the data sequence, such that (1) the oldest data element is removed from the window and a new data element is pushed into the window (2) find the median of the data inside the window at each moving.
I can use 2 binary search trees to do it. it is O(n lg m), where m is window size.
I am wondering if there is a O(n) time algorithm .
Re: find the median in a fix-sized moving window on a long sequency of data
if the data is sampled from a fixed probability distribution with finite mean, then the distance from the mean to the median in each window is constant on average. In this case, a multipass algorithm working with amortized O(n) time may be possible: the first pass computes the moving averages, then next passes move the computed values Vi toward the median using the counts of items <Vi and >Vi and the highest ( lowest ) item lower (greater) than V.
of course, such an algorithm would be useful only in a scenario where the window size varies proportionally with the sequence length ...
Re: find the median in a fix-sized moving window on a long sequency of data
I am guessing O(n log m) is about as good as you'll get. Superbono's suggestion nothwithstanding, I cannot think of a simple general-case algorithm.
Best Regards,
BioPhysEngr http://blog.biophysengr.net
--
All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.
Re: find the median in a fix-sized moving window on a long sequency of data
Originally Posted by dtustudy68
I am wondering if there is a O(n) time algorithm .
I don't know for sure but I have a suggestion.
Say there's a search tree that's used to keep the numbers within the sliding window in sorted order. The median is the middle number in the tree.
For each window number a pointer is kept to the internal tree node where the number is stored. It means numbers don't have to be searched in O(log N) from the root but can be reached directly in O(1).
I recall that under these circumstances a search tree can be updated in O(1). This means when the window moves (and one number leaves and another enters) the tree can be updated to reflect this change in O(1).
After an update either the median stays the same or its immediate successor or predessor in the tree becomes the new median. It should be possible to find these in O(1).
If the above holds the median can be found in O(1) after each window move. If it moves N times the algorithm becomes O(N). It could be worth investigating further.
Last edited by nuzzle; April 11th, 2012 at 03:31 PM.
My suggestion requires that succ and pred operations exist and can be performed in O(1). I don't know if exactly this tree can handle that but it seems reasonable. If an update can be made in O(1) finding a neighbour in O(1) shouldn't be far away.
Last edited by nuzzle; April 12th, 2012 at 12:29 AM.
Re: find the median in a fix-sized moving window on a long sequency of data
Originally Posted by dtustudy68
I can use 2 binary search trees to do it. it is O(n lg m), where m is window size.
My second suggestion is to replace the binary search trees with fibonacci heaps, a max-heap for the left partition and a min-heap for the right. The total complexity will be the same but since many individual heap operations have O(1) complexity where a tree has O(log N) using heaps should be faster in practice.
For example in a fibonacci heap inserting a number is O(1). Finding the min number in a min-heap or the max number in a max-heap is also O(1). Further more a certain update operation called decreaseKey is O(1). It's when the new number is smaller than the number it replaces.
Unfortunately deleting a number still is O(log N) and this is why the total complexity isn't reduced with this suggestion. But it should be faster in practice and it shouldn't be too hard to locate a robust fibonacci heap implementation since it's a fairly standard data structure.
Last edited by nuzzle; April 12th, 2012 at 02:12 AM.
Bookmarks