CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 5 of 5 FirstFirst ... 2345
Results 61 to 68 of 68
  1. #61
    Lindley is offline Elite Member Power Poster
    Join Date
    Oct 2007
    Location
    Seattle, WA
    Posts
    10,895

    Re: counting consecutive null characters

    Hmm.....you'd only enter this case when the post-jump iterator points to 0 and the pre-jump doesn't, and if you assume that *usually* you won't be looking at a new longest in this case, then you're likely to find a non-0 by advancing the post-jump before you would find a 0 by advancing the pre-jump.

    However, after you do find a non-0 with the post-jump, you'd need to still advance the pre-jump by at least as much before you could definitively rule out the possibility that you're looking at a new longest sequence. And given that with a forward iterator you probably aren't dealing with contiguous memory, there's no cache benefit to doing all of the post-jump increments first. So I think it could be done alternately without any loss of efficiency, at least until the post-jump encounters a nonzero.

  2. #62
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: counting consecutive null characters

    Quote Originally Posted by Lindley View Post
    Hmm.....you'd only enter this case when the post-jump iterator points to 0 and the pre-jump doesn't, and if you assume that *usually* you won't be looking at a new longest in this case, then you're likely to find a non-0 by advancing the post-jump before you would find a 0 by advancing the pre-jump.

    However, after you do find a non-0 with the post-jump, you'd need to still advance the pre-jump by at least as much before you could definitively rule out the possibility that you're looking at a new longest sequence. And given that with a forward iterator you probably aren't dealing with contiguous memory, there's no cache benefit to doing all of the post-jump increments first. So I think it could be done alternately without any loss of efficiency, at least until the post-jump encounters a nonzero.
    true.

  3. #63
    Join Date
    Aug 2000
    Location
    New York, NY, USA
    Posts
    5,656

    Re: counting consecutive null characters

    Quote Originally Posted by monarch_dodra View Post
    Well, I had my fun.

    I won't bore you with everything I did, but the conclusion is that the jump algorithm applied to bidirectional iterators proved some good performance.

    After much tweaking, I got some good result for RA, BD, and forward iterators.

    My test bench was a pre-filled char array of 8096 chars randomly distributed between '0' and '1'. I do the search 100 000 times. There are 13 consecutive 0.

    My times:
    Standard array:
    Jump algorithm (RA): 0.73 seconds
    Standard algo: 3.1 seconds
    Since it took me awhile to come up with this, I *AM* going to bore you with details.

    1. As I’ve mentioned, Visual Studio 2005 optimized away the 100,000 loop. So I had to use QueryPerformanceCounter() on a single path throug the data. The results are in milliseconds.

    2. I’ve implemented “simple” algorithm, just for reference.

    3. I am using SSE to produce a 16-bit mask for each 16 bytes of data where bit is set if data byte is equal to a sample byte.

    4. I failed to implement ‘jump’ algorithm on bits (what I got was slower than other implementation).

    5. I’ve implemented a lookup table (BTW, first choice in any high-performance algorithm), where for each 16-bit word (all 64K of them) I pre-calculate length of leftmost sequense of 1’s (in case that previous word ended with 1), longest middle sequence (surrounded by 0’s), and the rightmost sequence (for possible roll-over to the next word).

    The result is about half the time of monarch_dodra's 'jump' algorithm.
    Table initialization is expensive, but will pay off at 1,000 iterations.

    Here are my results (on Core 2 Quad Q6600 @ 3GHz):

    vlad's 'table intialization'
    4.71281

    Simple sequential algorithm
    0.0388881
    13

    monarch_dodra's 'jump' algorithm
    0.00948323
    13

    vlad's 'SSE + table lookup' algorithm
    0.00439277
    13
    Attached is zipped solution. For non-Visual Studio users, just use the source files.
    Would really appreciate any comments.
    Attached Files Attached Files
    Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
    Convenience and productivity tools for Microsoft Visual Studio:
    FeinWindows - replacement windows manager for Visual Studio, and more...

  4. #64
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: counting consecutive null characters

    Quote Originally Posted by VladimirF View Post
    ...
    I don't have time to look into it in detail right now, but i definitely want to sit down and see. I was planning to try to multi-thread the jump and see how fast it can get (in theory, up to 4x faster, which would beat your SSE )

  5. #65
    Join Date
    Aug 2000
    Location
    New York, NY, USA
    Posts
    5,656

    Re: counting consecutive null characters

    Quote Originally Posted by monarch_dodra View Post
    I don't have time to look into it in detail right now, but i definitely want to sit down and see. I was planning to try to multi-thread the jump and see how fast it can get (in theory, up to 4x faster, which would beat your SSE )
    At first I was going to say that I too can do multithreading. However, I am not having any luck with it. Even if I simply call one of my functions on a separate thread and wait for it to complete (no synchronization needed at all), it ads an overhead that is 10 times my total execution time. Doesn’t look like it depends on a sample data size, appears to be a fixed value ~80 microseconds. Does it seam reasonable?
    Anyway, I suspect that you will hit the same wall. Please let me know if I am wrong. Good luck!
    Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
    Convenience and productivity tools for Microsoft Visual Studio:
    FeinWindows - replacement windows manager for Visual Studio, and more...

  6. #66
    Lindley is offline Elite Member Power Poster
    Join Date
    Oct 2007
    Location
    Seattle, WA
    Posts
    10,895

    Re: counting consecutive null characters

    Well, spawning a separate thread may not make sense to include in the timing. If you use a thread-pool design, simply dispatching tasks to be picked up by pre-existing threads, what then? Or maybe use something designed for fine-grained parallelism like OpenMP or ITBB.

    Of course, another option that we haven't talked about at all yet would be writing a GPU implementation. That's probably serious overkill for the problem, but hell....why not.

    The algorithm would be simple (assuming G80 architecture): Download the string of bytes to the GPU, then do a simple reduction, which can be computed in O(log n) relative to the size of the input string. This could probably all be done in one kernel using CUDA; it would take multiple renders using OpenGL+CG. I'm not sure about OpenCL, I need to read up on it. Then you pull the result back into main memory.

    The slowest part of the whole thing would probably be the latency due to the GPU download.

    The details of the reduction might be a bit tricky. I'll try to work them out.
    Last edited by Lindley; April 23rd, 2010 at 07:44 PM.

  7. #67
    Join Date
    Aug 2000
    Location
    New York, NY, USA
    Posts
    5,656

    Re: counting consecutive null characters

    Quote Originally Posted by Lindley View Post
    Well, spawning a separate thread may not make sense to include in the timing.
    Well, even if I simply do table initialization in multiple threads (it is timed separately, and does show 4-times performance improvement on 4-core processor), the main processing algorithm takes a LOT more time. I do not know what a connection here is. Possibly, optimizing compiler gets spooked by seeing access to the table from multiple threads. Does this sound reasonable?
    Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
    Convenience and productivity tools for Microsoft Visual Studio:
    FeinWindows - replacement windows manager for Visual Studio, and more...

  8. #68
    Lindley is offline Elite Member Power Poster
    Join Date
    Oct 2007
    Location
    Seattle, WA
    Posts
    10,895

    Re: counting consecutive null characters

    Quote Originally Posted by Lindley View Post
    The details of the reduction might be a bit tricky. I'll try to work them out.
    Okay, here's a first shot. For background, a GPU shader usually gives you 4 channels to work with (RGBA) even if your data is only 1 channel at minimal extra cost. So we'll leverage that.

    In a reduction, you iteratively replace two values with one at every pair of positions in the array, effectively reducing its length by a factor of 2, all in parallel. You repeat until you've got only one value. We'll refer to the index of the current level of the reduction, and the original indices which reduced to it from the first level.

    The r channel will store the length of the longest sequence of 0s which a given index knows about. If nonzero, the g channel stores the length of any sequence of 0s ending at the minimum original index this index knows about. Thus g <= r. Similarly, b stores the length of any sequence of 0s ending at the maximum original index this index knows about. a is a flag which is 1 if this index knows about at least one nonzero original index.

    Then, each reduction step will look like:
    Z = reduce(X,Y)
    where X and Y are adjacent pixels of the previous level (ind(X) < ind(Y)) and Z is the new output,
    Code:
    int joincount = X.b + Y.g;
    if (!X.a)
        Z.g = joincount;
    else
        Z.g = X.g;
    if (!Y.a)
        Z.b = joincount;
    else
        Z.b = Y.b;
    Z.a = X.a || Y.a;
    Z.r = max(X.r,Y.r,joincount);
    In practice I expect this isn't the best GPU algorithm, since conditionals on a GPU are slow. But it could easily be rewritten as, for instance, Z.g = (1.0-X.a)*joincount + X.a*X.g;.
    Code:
    00324330000000204
    Pass 1: convert to 4-channel format
    {1,1,1,0}{1,1,1,0}{0,0,0,1}{0,0,0,1}{0,0,0,1}{0,0,0,1}{0,0,0,1}{1,1,1,0}{1,1,1,0}{1,1,1,0}{1,1,1,0}{1,1,1,0}{1,1,1,0}{1,1,1,0}{0,0,0,1}{1,1,1,0}{0,0,0,1}
    Pass 2: reduce (note correct behavior on the odd boundary condition)
    {2,2,2,0}{0,0,0,1}{0,0,0,1}{1,0,1,1}{2,2,2,0}{2,2,2,0}{2,2,2,0}{1,0,1,1}{0,0,0,1}
    Pass 3: reduce
    {2,2,0,1}{1,0,1,1}{4,4,4,0}{2,2,1,1}{0,0,0,1}
    Pass 4: reduce
    {2,2,1,1}{6,6,1,1}{0,0,0,1}
    Pass 5: reduce
    {7,2,1,1}{0,0,0,1}
    Pass 6: reduce
    {7,2,0,1}
    Output: 7
    I'm relying on the fact that an out-of-bounds access will produce {0,0,0,0}, which is easy enough to arrange for a GPU texture query.
    Last edited by Lindley; April 24th, 2010 at 10:17 PM.

Page 5 of 5 FirstFirst ... 2345

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured