find the median in a fix-sized moving window on a long sequency of data
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 7 of 7

Thread: find the median in a fix-sized moving window on a long sequency of data

  1. #1
    Join Date
    Jul 2010
    Posts
    5

    find the median in a fix-sized moving window on a long sequency of data

    Given a sequence of data (it may have duplicates), a fixed-sized moving window, move the window at each iteration from the start of the data sequence, such that (1) the oldest data element is removed from the window and a new data element is pushed into the window (2) find the median of the data inside the window at each moving.

    I can use 2 binary search trees to do it. it is O(n lg m), where m is window size.

    I am wondering if there is a O(n) time algorithm .

    Any help is appreciated.

  2. #2
    Join Date
    Oct 2008
    Posts
    1,102

    Re: find the median in a fix-sized moving window on a long sequency of data

    if the data is sampled from a fixed probability distribution with finite mean, then the distance from the mean to the median in each window is constant on average. In this case, a multipass algorithm working with amortized O(n) time may be possible: the first pass computes the moving averages, then next passes move the computed values Vi toward the median using the counts of items <Vi and >Vi and the highest ( lowest ) item lower (greater) than V.

    of course, such an algorithm would be useful only in a scenario where the window size varies proportionally with the sequence length ...

  3. #3
    Join Date
    Feb 2011
    Location
    United States
    Posts
    1,006

    Re: find the median in a fix-sized moving window on a long sequency of data

    I am guessing O(n log m) is about as good as you'll get. Superbono's suggestion nothwithstanding, I cannot think of a simple general-case algorithm.
    Best Regards,

    BioPhysEngr
    http://blog.biophysengr.net
    --
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  4. #4
    Join Date
    May 2009
    Posts
    2,413

    Re: find the median in a fix-sized moving window on a long sequency of data

    Quote Originally Posted by dtustudy68 View Post
    I am wondering if there is a O(n) time algorithm .
    I don't know for sure but I have a suggestion.

    Say there's a search tree that's used to keep the numbers within the sliding window in sorted order. The median is the middle number in the tree.

    For each window number a pointer is kept to the internal tree node where the number is stored. It means numbers don't have to be searched in O(log N) from the root but can be reached directly in O(1).

    I recall that under these circumstances a search tree can be updated in O(1). This means when the window moves (and one number leaves and another enters) the tree can be updated to reflect this change in O(1).

    After an update either the median stays the same or its immediate successor or predessor in the tree becomes the new median. It should be possible to find these in O(1).

    If the above holds the median can be found in O(1) after each window move. If it moves N times the algorithm becomes O(N). It could be worth investigating further.
    Last edited by nuzzle; April 11th, 2012 at 03:31 PM.

  5. #5
    Join Date
    May 2009
    Posts
    2,413

    Re: find the median in a fix-sized moving window on a long sequency of data

    In support of my previous suggestion here's a link that shows updates in O(1) are possible indeed,

    http://www.tcs.fudan.edu.cn/rudolf/P...tree_ijfcs.pdf

    My suggestion requires that succ and pred operations exist and can be performed in O(1). I don't know if exactly this tree can handle that but it seems reasonable. If an update can be made in O(1) finding a neighbour in O(1) shouldn't be far away.
    Last edited by nuzzle; April 12th, 2012 at 12:29 AM.

  6. #6
    Join Date
    May 2009
    Posts
    2,413

    Re: find the median in a fix-sized moving window on a long sequency of data

    Quote Originally Posted by dtustudy68 View Post
    I can use 2 binary search trees to do it. it is O(n lg m), where m is window size.
    My second suggestion is to replace the binary search trees with fibonacci heaps, a max-heap for the left partition and a min-heap for the right. The total complexity will be the same but since many individual heap operations have O(1) complexity where a tree has O(log N) using heaps should be faster in practice.

    For example in a fibonacci heap inserting a number is O(1). Finding the min number in a min-heap or the max number in a max-heap is also O(1). Further more a certain update operation called decreaseKey is O(1). It's when the new number is smaller than the number it replaces.

    Unfortunately deleting a number still is O(log N) and this is why the total complexity isn't reduced with this suggestion. But it should be faster in practice and it shouldn't be too hard to locate a robust fibonacci heap implementation since it's a fairly standard data structure.
    Last edited by nuzzle; April 12th, 2012 at 02:12 AM.

  7. #7
    Join Date
    May 2009
    Posts
    2,413

    Re: find the median in a fix-sized moving window on a long sequency of data

    Quote Originally Posted by nuzzle View Post
    My second suggestion is to replace the binary search trees with fibonacci heaps, a max-heap for the left partition and a min-heap for the right.
    It looks like this suggestion actually is patented.

    http://www.google.com.tw/patents/about?id=kQwWAAAAEBAJ

    It's not that surprising really because median filtering is an important way to reduce random "salt and pepper" noise.

    Spontaneously I find it quite annoying that it's possible to patent things like this. But fortunately the patent is somewhat limited in scope to "implantable medical device". Also using other kinds of heaps (such as fibonacci) doesn't seem to infringe on this particular patent because it's clearly limited to binary heaps.

    Also it seems limited to finding the median in O(log N) where N is the sliding window size. So if someone actually comes up with an O(1) solution it most likely has substantial commersial value and it would be a good idea to first file for a patent before posting it here. And then make a killing by letting Medtronic and their worst competitors fight for exclusive rights.
    Last edited by nuzzle; April 13th, 2012 at 03:17 AM.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Azure Activities Information Page

Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center