Re: counting consecutive null characters
Quote:
Originally Posted by
monarch_dodra
There is a fundamental difference between working
well, and working
better. That is what this discussion has been about.
You probably missed
my post (the discussion was long), so I'm just going to quote myself:
It's like doing a binary search on a sorted list. Not great, but beats the alternative.
I saw your implementation and because you called it the "jump algorithm" I assumed it was based on the suggestion I made in #21. If not, what "alternative" is it that you're beating? It sure isn't my suggestion.
You mention Boyer Moore in a previous post. When you did were you aware that this algorithm has been referred to twice before in this thread already in relation to my suggestion? I took it as yet another queue that it was my suggestion you had implemented but maybe it was not.
And I realize now that it isn't. So the question is how much faster is my suggestion than your "binary" approach. :)
Really, the algorithm I suggested is the one to beat and that will be hard.
Re: counting consecutive null characters
Quote:
Originally Posted by
nuzzle
I saw your implementation and because you called it the "jump algorithm" I assumed it was based on the suggestion I made in #21. If not, what "alternative" is it that you're beating? It sure isn't my suggestion.
You mention Boyer Moore in a previous post. When you did were you aware that this algorithm has been referred to twice before in this thread already in relation to my suggestion? I took it as yet another queue that it was my suggestion you had implemented but maybe it was not.
And I realize now that it isn't. So the question is how much faster is my suggestion that your "binary" approach. :)
To be perfectly honest, I still haven't fully understood if you and Lindley suggested different things.
I had at first thought of a weird binary search approach, that would not have worked. This is why I called it "binary search", but it isn't. The current implementation I did was to skip forward N, an then look backwards and forwards when I find said character. Which is EXACTLY what your post 21 says. Is it not what you had in mind?
PS, the "alternative" is forward iterating through everything, counting.
PS2, if by "your algorithm", you mean the multi-threaded one, then I'n not going to try, but Honestly, I don't see it as very difficult. If it isn't, You'll have to clarify, because I'm pretty confident I already have ;)
Re: counting consecutive null characters
Quote:
Originally Posted by
monarch_dodra
To be perfectly honest, I still haven't fully understood if you and Lindley suggested different things.
I had at first thought of a weird binary search approach, that would not have worked. This is why I called it "binary search", but it isn't. The current implementation I did was to skip forward N, an then look backwards and forwards when I find said character. Which is EXACTLY what your post 21 says. Is it not what you had in mind?
PS, the "alternative" is forward iterating through everything, counting.
PS2, if by "your algorithm", you mean the multi-threaded one, then I'n not going to try, but Honestly, I don't see it as very difficult. If it isn't, You'll have to clarify, because I'm pretty confident I already have ;)
I don't quite know what you've implement because you haven't explained the principle. I was thrown off by your talks about "binary".
My suggestion is simple. You iterate through the array sequentially from the beginning to the end looking for zeroes. First you consider each character but as soon as you've found a zero sequence of length N you only have to consider every N+1'th character after that. Whenever you land on a zero you must investigate the length of the zero sequence it belongs to see if you've located a bigger N. The advantage is that the longer the zero sequence you've already found (the bigger the N), the more characters you can skip in each iteration.
That's my suggestion. If it's the same as yours we've independenly found the same solution. If not, yours may be better but it's not the naive approach you should compare it with anymore, it's my suggestion.
I'm fairly sure my suggestion is a special case of Boyer Moore which is well analyzed and known to be very efficient. That's why I can be so sure my suggestion is very hard to beat. But one should never say never. :)
PS: The concurrent version of my suggestion is just to make it even faster. It doesn't change anything in principle.
Re: counting consecutive null characters
Well I did exactly the same thing you said. The algorithm is not particularly complicated. Less than the generalized Boyer-Moore.
I may give the multy thread a try. I'll tell you how it comes along. Apparently, VladimirF is working on an ECC implementation.
Re: counting consecutive null characters
Quote:
Originally Posted by
monarch_dodra
To be perfectly honest, I still haven't fully understood if you and Lindley suggested different things.
We're not, we're saying pretty much exactly the same thing, just looking at the problem from different directions. It's just that nuzzle is focusing on how to improve the best case, and I'm focusing on how bad the worst case would be. Both very important questions when characterizing an algorithm.
There is one type of iterator that won't work at all with the jump algorithm----input iterators. Since they can't be copied, you can't use the "save a copy" trick to go "backwards" the way you can with forward iterators.
Re: counting consecutive null characters
Quote:
Originally Posted by
Lindley
We're not, we're saying pretty much exactly the same thing, just looking at the problem from different directions. It's just that nuzzle is focusing on how to improve the best case, and I'm focusing on how bad the worst case would be. Both very important questions when characterizing an algorithm.
In the worst case the "jump algorithm" performs like the naive algorithm. The best case depends on luck. The sooner a very long zero sequence is found and the fewer zero sequences that have to be investigated, the better the algorithm will perform.
I consider the choise of iterator a side track. It has nothing to do with the performance of the "jump algorithm". Any traditional strictly sequential iterator will impair the algorithms ability to make the jumps efficiently. As I said before, it's like forcing array sort methods to use sequential iterators for array accesses and then claim that you're studying the worst case performance of these methods. You're not, you're just introducing an artificial obstacle.
Re: counting consecutive null characters
You're missing the point.
It's not a side track, it's a different track, equally relevant to the overall problem. The problem is not "how good is the jump algorithm", the problem is "how do we count consecutive zeros."
The jump algorithm is a good solution in some cases. There are other cases it doesn't apply to, and solving the consecutive 0s problem entirely (at least, using an STL-type algorithm interface) *requires* that we provide a means of handling them.
Clearly, the jump algorithm is a good option in the random access iterator case. Just as clearly, the naive algorithm or something close to it is required in the input iterator case.
The interesting question from my perspective becomes, what about intermediate cases? Forward iterators and bidirectional iterators could (and, judging by monarch's numbers, do) get some benefit from the jump algorithm even though they aren't ideally suited to it. Is that the best approach, or is there a better algorithm for those iterator types? That's the angle I'm still thinking about.
Re: counting consecutive null characters
Quote:
Originally Posted by
nuzzle
As I said before, it's like forcing array sort methods to use sequential iterators for array accesses and then claim that you're studying the worst case performance of these methods. You're not, you're just introducing an artificial obstacle.
Well the way I see it, an algorithm can either accept a forward iterator, or it can reject a forward iterator. If the algorithm accepts it, then it is the implementer's job to make sure he chose the best algorithm for said iterator type.
It is not incoherent to imagine that someone may want to know the longest sequence inside an slist. Why not find the best algorithm for that?
Quote:
Originally Posted by
Lindley
Forward iterators and bidirectional iterators could (and, judging by monarch's numbers, do) get some benefit from the jump algorithm
Actually, I didn't do forward iterators because the implementation was straying too far away from RA, and I did not feel like doing it.
I still believe that we can achieve good results using jump for forward iterators if we modify the jump:
If after a jump of N, you land on a 0. Say you start by counting forward 3 zeros. Now, you still have your original pre-jump forward iterator. If after iterating forward 3 times, you encounter a single non-zero, then that's it, the current chain is smaller than N. Skip to the next jump. Tons of compares saved!
Overall, you'd probably iterate a bit more than the normal algorithm, but also save that much more comparisons. Of course, the longer the input, the closer to 0 these numbers get.
I'll try to benchmark that tonight. After I finish on the multi thread RA.
Re: counting consecutive null characters
You mean iterating both the pre-jump and post-jump forward iterators alternately, until you can be sure you aren't looking at a new longest sequence, then leapfrogging them with another jump? Yeah, that would work.
Re: counting consecutive null characters
Quote:
Originally Posted by
Lindley
You mean iterating both the pre-jump and post-jump forward iterators alternately, until you can be sure you aren't looking at a new longest sequence, then leapfrogging them with another jump? Yeah, that would work.
Yes. Although not alternatively, just start with the post jump, and then do the pre-jump.
Re: counting consecutive null characters
Hmm.....you'd only enter this case when the post-jump iterator points to 0 and the pre-jump doesn't, and if you assume that *usually* you won't be looking at a new longest in this case, then you're likely to find a non-0 by advancing the post-jump before you would find a 0 by advancing the pre-jump.
However, after you do find a non-0 with the post-jump, you'd need to still advance the pre-jump by at least as much before you could definitively rule out the possibility that you're looking at a new longest sequence. And given that with a forward iterator you probably aren't dealing with contiguous memory, there's no cache benefit to doing all of the post-jump increments first. So I think it could be done alternately without any loss of efficiency, at least until the post-jump encounters a nonzero.
Re: counting consecutive null characters
Quote:
Originally Posted by
Lindley
Hmm.....you'd only enter this case when the post-jump iterator points to 0 and the pre-jump doesn't, and if you assume that *usually* you won't be looking at a new longest in this case, then you're likely to find a non-0 by advancing the post-jump before you would find a 0 by advancing the pre-jump.
However, after you do find a non-0 with the post-jump, you'd need to still advance the pre-jump by at least as much before you could definitively rule out the possibility that you're looking at a new longest sequence. And given that with a forward iterator you probably aren't dealing with contiguous memory, there's no cache benefit to doing all of the post-jump increments first. So I think it could be done alternately without any loss of efficiency, at least until the post-jump encounters a nonzero.
true.
1 Attachment(s)
Re: counting consecutive null characters
Quote:
Originally Posted by
monarch_dodra
Well, I had my fun.
I won't bore you with everything I did, but the conclusion is that the jump algorithm applied to bidirectional iterators proved some good performance.
After much tweaking, I got some good result for RA, BD, and forward iterators.
My test bench was a pre-filled char array of 8096 chars randomly distributed between '0' and '1'. I do the search 100 000 times. There are 13 consecutive 0.
My times:
Standard array:
Jump algorithm (RA): 0.73 seconds
Standard algo: 3.1 seconds
Since it took me awhile to come up with this, I *AM* going to bore you with details.
1. As I’ve mentioned, Visual Studio 2005 optimized away the 100,000 loop. So I had to use QueryPerformanceCounter() on a single path throug the data. The results are in milliseconds.
2. I’ve implemented “simple” algorithm, just for reference.
3. I am using SSE to produce a 16-bit mask for each 16 bytes of data where bit is set if data byte is equal to a sample byte.
4. I failed to implement ‘jump’ algorithm on bits (what I got was slower than other implementation).
5. I’ve implemented a lookup table (BTW, first choice in any high-performance algorithm), where for each 16-bit word (all 64K of them) I pre-calculate length of leftmost sequense of 1’s (in case that previous word ended with 1), longest middle sequence (surrounded by 0’s), and the rightmost sequence (for possible roll-over to the next word).
The result is about half the time of monarch_dodra's 'jump' algorithm.
Table initialization is expensive, but will pay off at 1,000 iterations.
Here are my results (on Core 2 Quad Q6600 @ 3GHz):
Quote:
vlad's 'table intialization'
4.71281
Simple sequential algorithm
0.0388881
13
monarch_dodra's 'jump' algorithm
0.00948323
13
vlad's 'SSE + table lookup' algorithm
0.00439277
13
Attached is zipped solution. For non-Visual Studio users, just use the source files.
Would really appreciate any comments.
Re: counting consecutive null characters
Quote:
Originally Posted by
VladimirF
...
I don't have time to look into it in detail right now, but i definitely want to sit down and see. I was planning to try to multi-thread the jump and see how fast it can get (in theory, up to 4x faster, which would beat your SSE ;))
Re: counting consecutive null characters
Quote:
Originally Posted by
monarch_dodra
I don't have time to look into it in detail right now, but i definitely want to sit down and see. I was planning to try to multi-thread the jump and see how fast it can get (in theory, up to 4x faster, which would beat your SSE ;))
At first I was going to say that I too can do multithreading. However, I am not having any luck with it. Even if I simply call one of my functions on a separate thread and wait for it to complete (no synchronization needed at all), it ads an overhead that is 10 times my total execution time. Doesn’t look like it depends on a sample data size, appears to be a fixed value ~80 microseconds. Does it seam reasonable?
Anyway, I suspect that you will hit the same wall. Please let me know if I am wrong. Good luck!
Re: counting consecutive null characters
Well, spawning a separate thread may not make sense to include in the timing. If you use a thread-pool design, simply dispatching tasks to be picked up by pre-existing threads, what then? Or maybe use something designed for fine-grained parallelism like OpenMP or ITBB.
Of course, another option that we haven't talked about at all yet would be writing a GPU implementation. That's probably serious overkill for the problem, but hell....why not.
The algorithm would be simple (assuming G80 architecture): Download the string of bytes to the GPU, then do a simple reduction, which can be computed in O(log n) relative to the size of the input string. This could probably all be done in one kernel using CUDA; it would take multiple renders using OpenGL+CG. I'm not sure about OpenCL, I need to read up on it. Then you pull the result back into main memory.
The slowest part of the whole thing would probably be the latency due to the GPU download.
The details of the reduction might be a bit tricky. I'll try to work them out.
Re: counting consecutive null characters
Quote:
Originally Posted by
Lindley
Well, spawning a separate thread may not make sense to include in the timing.
Well, even if I simply do table initialization in multiple threads (it is timed separately, and does show 4-times performance improvement on 4-core processor), the main processing algorithm takes a LOT more time. I do not know what a connection here is. Possibly, optimizing compiler gets spooked by seeing access to the table from multiple threads. Does this sound reasonable?
Re: counting consecutive null characters
Quote:
Originally Posted by
Lindley
The details of the reduction might be a bit tricky. I'll try to work them out.
Okay, here's a first shot. For background, a GPU shader usually gives you 4 channels to work with (RGBA) even if your data is only 1 channel at minimal extra cost. So we'll leverage that.
In a reduction, you iteratively replace two values with one at every pair of positions in the array, effectively reducing its length by a factor of 2, all in parallel. You repeat until you've got only one value. We'll refer to the index of the current level of the reduction, and the original indices which reduced to it from the first level.
The r channel will store the length of the longest sequence of 0s which a given index knows about. If nonzero, the g channel stores the length of any sequence of 0s ending at the minimum original index this index knows about. Thus g <= r. Similarly, b stores the length of any sequence of 0s ending at the maximum original index this index knows about. a is a flag which is 1 if this index knows about at least one nonzero original index.
Then, each reduction step will look like:
Z = reduce(X,Y)
where X and Y are adjacent pixels of the previous level (ind(X) < ind(Y)) and Z is the new output,
Code:
int joincount = X.b + Y.g;
if (!X.a)
Z.g = joincount;
else
Z.g = X.g;
if (!Y.a)
Z.b = joincount;
else
Z.b = Y.b;
Z.a = X.a || Y.a;
Z.r = max(X.r,Y.r,joincount);
In practice I expect this isn't the best GPU algorithm, since conditionals on a GPU are slow. But it could easily be rewritten as, for instance, Z.g = (1.0-X.a)*joincount + X.a*X.g;.
Code:
00324330000000204
Pass 1: convert to 4-channel format
{1,1,1,0}{1,1,1,0}{0,0,0,1}{0,0,0,1}{0,0,0,1}{0,0,0,1}{0,0,0,1}{1,1,1,0}{1,1,1,0}{1,1,1,0}{1,1,1,0}{1,1,1,0}{1,1,1,0}{1,1,1,0}{0,0,0,1}{1,1,1,0}{0,0,0,1}
Pass 2: reduce (note correct behavior on the odd boundary condition)
{2,2,2,0}{0,0,0,1}{0,0,0,1}{1,0,1,1}{2,2,2,0}{2,2,2,0}{2,2,2,0}{1,0,1,1}{0,0,0,1}
Pass 3: reduce
{2,2,0,1}{1,0,1,1}{4,4,4,0}{2,2,1,1}{0,0,0,1}
Pass 4: reduce
{2,2,1,1}{6,6,1,1}{0,0,0,1}
Pass 5: reduce
{7,2,1,1}{0,0,0,1}
Pass 6: reduce
{7,2,0,1}
Output: 7
I'm relying on the fact that an out-of-bounds access will produce {0,0,0,0}, which is easy enough to arrange for a GPU texture query.