# find the median in a fix-sized moving window on a long sequency of data

• April 6th, 2012, 11:12 AM
dtustudy68
find the median in a fix-sized moving window on a long sequency of data
Given a sequence of data (it may have duplicates), a fixed-sized moving window, move the window at each iteration from the start of the data sequence, such that (1) the oldest data element is removed from the window and a new data element is pushed into the window (2) find the median of the data inside the window at each moving.

I can use 2 binary search trees to do it. it is O(n lg m), where m is window size.

I am wondering if there is a O(n) time algorithm .

Any help is appreciated.
• April 7th, 2012, 03:14 AM
superbonzo
Re: find the median in a fix-sized moving window on a long sequency of data
if the data is sampled from a fixed probability distribution with finite mean, then the distance from the mean to the median in each window is constant on average. In this case, a multipass algorithm working with amortized O(n) time may be possible: the first pass computes the moving averages, then next passes move the computed values Vi toward the median using the counts of items <Vi and >Vi and the highest ( lowest ) item lower (greater) than V.

of course, such an algorithm would be useful only in a scenario where the window size varies proportionally with the sequence length ...
• April 7th, 2012, 11:38 PM
BioPhysEngr
Re: find the median in a fix-sized moving window on a long sequency of data
I am guessing O(n log m) is about as good as you'll get. Superbono's suggestion nothwithstanding, I cannot think of a simple general-case algorithm.
• April 11th, 2012, 02:52 PM
nuzzle
Re: find the median in a fix-sized moving window on a long sequency of data
Quote:

Originally Posted by dtustudy68
I am wondering if there is a O(n) time algorithm .

I don't know for sure but I have a suggestion.

Say there's a search tree that's used to keep the numbers within the sliding window in sorted order. The median is the middle number in the tree.

For each window number a pointer is kept to the internal tree node where the number is stored. It means numbers don't have to be searched in O(log N) from the root but can be reached directly in O(1).

I recall that under these circumstances a search tree can be updated in O(1). This means when the window moves (and one number leaves and another enters) the tree can be updated to reflect this change in O(1).

After an update either the median stays the same or its immediate successor or predessor in the tree becomes the new median. It should be possible to find these in O(1).

If the above holds the median can be found in O(1) after each window move. If it moves N times the algorithm becomes O(N). It could be worth investigating further.
• April 12th, 2012, 12:21 AM
nuzzle
Re: find the median in a fix-sized moving window on a long sequency of data
In support of my previous suggestion here's a link that shows updates in O(1) are possible indeed,

http://www.tcs.fudan.edu.cn/rudolf/P...tree_ijfcs.pdf

My suggestion requires that succ and pred operations exist and can be performed in O(1). I don't know if exactly this tree can handle that but it seems reasonable. If an update can be made in O(1) finding a neighbour in O(1) shouldn't be far away.
• April 12th, 2012, 01:22 AM
nuzzle
Re: find the median in a fix-sized moving window on a long sequency of data
Quote:

Originally Posted by dtustudy68
I can use 2 binary search trees to do it. it is O(n lg m), where m is window size.

My second suggestion is to replace the binary search trees with fibonacci heaps, a max-heap for the left partition and a min-heap for the right. The total complexity will be the same but since many individual heap operations have O(1) complexity where a tree has O(log N) using heaps should be faster in practice.

For example in a fibonacci heap inserting a number is O(1). Finding the min number in a min-heap or the max number in a max-heap is also O(1). Further more a certain update operation called decreaseKey is O(1). It's when the new number is smaller than the number it replaces.

Unfortunately deleting a number still is O(log N) and this is why the total complexity isn't reduced with this suggestion. But it should be faster in practice and it shouldn't be too hard to locate a robust fibonacci heap implementation since it's a fairly standard data structure.
• April 13th, 2012, 01:43 AM
nuzzle
Re: find the median in a fix-sized moving window on a long sequency of data
Quote:

Originally Posted by nuzzle
My second suggestion is to replace the binary search trees with fibonacci heaps, a max-heap for the left partition and a min-heap for the right.

It looks like this suggestion actually is patented.