find the median in a fix-sized moving window on a long sequency of data

Given a sequence of data (it may have duplicates), a fixed-sized moving window, move the window at each iteration from the start of the data sequence, such that (1) the oldest data element is removed from the window and a new data element is pushed into the window (2) find the median of the data inside the window at each moving.

I can use 2 binary search trees to do it. it is O(n lg m), where m is window size.

I am wondering if there is a O(n) time algorithm .

Any help is appreciated.

Re: find the median in a fix-sized moving window on a long sequency of data

if the data is sampled from a fixed probability distribution with finite mean, then the distance from the mean to the median in each window is constant on average. In this case, a multipass algorithm working with amortized O(n) time may be possible: the first pass computes the moving averages, then next passes move the computed values Vi toward the median using the counts of items <Vi and >Vi and the highest ( lowest ) item lower (greater) than V.

of course, such an algorithm would be useful only in a scenario where the window size varies proportionally with the sequence length ...

Re: find the median in a fix-sized moving window on a long sequency of data

I am guessing O(n log m) is about as good as you'll get. Superbono's suggestion nothwithstanding, I cannot think of a simple general-case algorithm.

Re: find the median in a fix-sized moving window on a long sequency of data

Quote:

Originally Posted by

**dtustudy68**
I am wondering if there is a O(n) time algorithm .

I don't know for sure but I have a suggestion.

Say there's a search tree that's used to keep the numbers within the sliding window in sorted order. The median is the middle number in the tree.

For each window number a pointer is kept to the internal tree node where the number is stored. It means numbers don't have to be searched in O(log N) from the root but can be reached directly in O(1).

I recall that under these circumstances a search tree can be updated in O(1). This means when the window moves (and one number leaves and another enters) the tree can be updated to reflect this change in O(1).

After an update either the median stays the same or its immediate successor or predessor in the tree becomes the new median. It should be possible to find these in O(1).

If the above holds the median can be found in O(1) after each window move. If it moves N times the algorithm becomes O(N). It could be worth investigating further.

Re: find the median in a fix-sized moving window on a long sequency of data

In support of my previous suggestion here's a link that shows updates in O(1) are possible indeed,

http://www.tcs.fudan.edu.cn/rudolf/P...tree_ijfcs.pdf

My suggestion requires that succ and pred operations exist and can be performed in O(1). I don't know if exactly this tree can handle that but it seems reasonable. If an update can be made in O(1) finding a neighbour in O(1) shouldn't be far away.

Re: find the median in a fix-sized moving window on a long sequency of data

Quote:

Originally Posted by

**dtustudy68**
I can use 2 binary search trees to do it. it is O(n lg m), where m is window size.

My second suggestion is to replace the binary search trees with fibonacci heaps, a max-heap for the left partition and a min-heap for the right. The total complexity will be the same but since many individual heap operations have O(1) complexity where a tree has O(log N) using heaps should be faster in practice.

For example in a fibonacci heap inserting a number is O(1). Finding the min number in a min-heap or the max number in a max-heap is also O(1). Further more a certain update operation called decreaseKey is O(1). It's when the new number is smaller than the number it replaces.

Unfortunately deleting a number still is O(log N) and this is why the total complexity isn't reduced with this suggestion. But it should be faster in practice and it shouldn't be too hard to locate a robust fibonacci heap implementation since it's a fairly standard data structure.

Re: find the median in a fix-sized moving window on a long sequency of data

Quote:

Originally Posted by

**nuzzle**
My second suggestion is to replace the binary search trees with fibonacci heaps, a max-heap for the left partition and a min-heap for the right.

It looks like this suggestion actually is patented.

http://www.google.com.tw/patents/about?id=kQwWAAAAEBAJ

It's not that surprising really because median filtering is an important way to reduce random "salt and pepper" noise.

Spontaneously I find it quite annoying that it's possible to patent things like this. But fortunately the patent is somewhat limited in scope to "implantable medical device". Also using other kinds of heaps (such as fibonacci) doesn't seem to infringe on this particular patent because it's clearly limited to binary heaps.

Also it seems limited to finding the median in O(log N) where N is the sliding window size. So if someone actually comes up with an O(1) solution it most likely has substantial commersial value and it would be a good idea to first file for a patent before posting it here. And then make a killing by letting Medtronic and their worst competitors fight for exclusive rights. :)