Re: counting consecutive null characters
Quote:
Originally Posted by
Lindley
It's not a "choice" of iterators. The coder writing the algorithm does not necessarily control what type of iterator they're going to get when the algorithm is called. They could put a restriction on that, but the better option is to fall back to a simpler, less efficient algorithm if the iterator type provided doesn't work with the fast approach.
The proposed "jump" optimization can be applied best to random-access iterators, and for those I'm sure it works fantastically. I was simply speculating on how well it would extend to the other types, if at all.
And yes, I did grasp why it's such a good idea.
Well the "jump optimization" can be applied to almost all iterator types. The less "random access" the iterator is, the slower the algorithm, but probably better than the "dumb" algorithm regardless.
RA works very fast.
Bidirectional requires many iterations for the jumps, but does less comparisons
Forward iterator also works, if you keep the "previous" iterator in memory.
I was surprised, but I tested it, and there are gains for all iterator types.
The only iterator where the "jump" algorithm does not work are input iterators*...
I did not realize it at first, but I guess it is a generalization of the Boyer Moore algorithm.
*I had said in some previous posts that the algorithm did not work on forward iterators, but I meant input.
Re: counting consecutive null characters
Quote:
Originally Posted by
Lindley
The proposed "jump" optimization can be applied best to random-access iterators, and for those I'm sure it works fantastically. I was simply speculating on how well it would extend to the other types, if at all.
The "jump algorithm" doesn't work well with traditional sequential iterators. The reason is obvious. It's because they don't support the element skipping the "jump algorithm" is based on.
Compare with array sorting algorithms. If you enforce accesses to be via iterator even the fastest sorting algorithm becomes slow like a dinosaur in a tar pit. Just like array sorting the "jump algorithm" needs random access.
So you can stop speculating now. Introducing an obstacle like an iterator is guaranteed to spoil the performance of the "jump algorithm". The only kind of "iterator" that will work well is one that's not strictly sequential, like for example the C++ random-access iterator.
Re: counting consecutive null characters
Quote:
Originally Posted by
monarch_dodra
I was surprised, but I tested it, and there are gains for all iterator types.
The reason for that probably is that although a sequential iterator must visit all elements during a skip, it doesn't have to access them. It just quickly passes the elements whereas the naive approach must stay and check whether each element is zero. Still, making a random access skip is faster.
Re: counting consecutive null characters
It's also worth mentioning that the "jump algorithm" performs better on a muticore processor. Let's say you have four cores. This means if you divide the array in four sections you can run four concurrent "jump algorithms" at the same time. This gives a theoretical speedup of four times.
But what's even better is that those four "jump algorithms" can share a skip length counter. This means that when one of the "jump algorithms" finds a longer zero sequence this finding is instantly shared with the three other "jump algorithms" allowing them to make longer jumps too.
So the concurrent "jump algorithms" don't just tug along each with their own part of the array. They cooperate so that all of them uses the longest jump found by any one of them.
A note. If someone plans to implement this make sure not to split the array in the middle of a zero sequence. Also handling of the shared skip length counter requires some care.
Re: counting consecutive null characters
Quote:
Originally Posted by
nuzzle
So you can stop speculating now. Introducing an obstacle like an iterator is guaranteed to spoil the performance of the "jump algorithm". The only kind of "iterator" that will work well is one that's not strictly sequential, like for example the C++ random-access iterator.
There is a fundamental difference between working well, and working better. That is what this discussion has been about.
You probably missed my post (the discussion was long), so I'm just going to quote myself:
Quote:
My times:
Standard array:
Jump algorithm (RA): 0.73 seconds
Standard algo: 3.1 seconds
List:
Jump algorithm (BD): 2.2 seconds
Standard algo: 3.5 seconds
It's like doing a binary search on a sorted list. Not great, but beats the alternative.
Re: counting consecutive null characters
Quote:
Originally Posted by
monarch_dodra
There is a fundamental difference between working
well, and working
better. That is what this discussion has been about.
You probably missed
my post (the discussion was long), so I'm just going to quote myself:
It's like doing a binary search on a sorted list. Not great, but beats the alternative.
I saw your implementation and because you called it the "jump algorithm" I assumed it was based on the suggestion I made in #21. If not, what "alternative" is it that you're beating? It sure isn't my suggestion.
You mention Boyer Moore in a previous post. When you did were you aware that this algorithm has been referred to twice before in this thread already in relation to my suggestion? I took it as yet another queue that it was my suggestion you had implemented but maybe it was not.
And I realize now that it isn't. So the question is how much faster is my suggestion than your "binary" approach. :)
Really, the algorithm I suggested is the one to beat and that will be hard.
Re: counting consecutive null characters
Quote:
Originally Posted by
nuzzle
I saw your implementation and because you called it the "jump algorithm" I assumed it was based on the suggestion I made in #21. If not, what "alternative" is it that you're beating? It sure isn't my suggestion.
You mention Boyer Moore in a previous post. When you did were you aware that this algorithm has been referred to twice before in this thread already in relation to my suggestion? I took it as yet another queue that it was my suggestion you had implemented but maybe it was not.
And I realize now that it isn't. So the question is how much faster is my suggestion that your "binary" approach. :)
To be perfectly honest, I still haven't fully understood if you and Lindley suggested different things.
I had at first thought of a weird binary search approach, that would not have worked. This is why I called it "binary search", but it isn't. The current implementation I did was to skip forward N, an then look backwards and forwards when I find said character. Which is EXACTLY what your post 21 says. Is it not what you had in mind?
PS, the "alternative" is forward iterating through everything, counting.
PS2, if by "your algorithm", you mean the multi-threaded one, then I'n not going to try, but Honestly, I don't see it as very difficult. If it isn't, You'll have to clarify, because I'm pretty confident I already have ;)
Re: counting consecutive null characters
Quote:
Originally Posted by
monarch_dodra
To be perfectly honest, I still haven't fully understood if you and Lindley suggested different things.
I had at first thought of a weird binary search approach, that would not have worked. This is why I called it "binary search", but it isn't. The current implementation I did was to skip forward N, an then look backwards and forwards when I find said character. Which is EXACTLY what your post 21 says. Is it not what you had in mind?
PS, the "alternative" is forward iterating through everything, counting.
PS2, if by "your algorithm", you mean the multi-threaded one, then I'n not going to try, but Honestly, I don't see it as very difficult. If it isn't, You'll have to clarify, because I'm pretty confident I already have ;)
I don't quite know what you've implement because you haven't explained the principle. I was thrown off by your talks about "binary".
My suggestion is simple. You iterate through the array sequentially from the beginning to the end looking for zeroes. First you consider each character but as soon as you've found a zero sequence of length N you only have to consider every N+1'th character after that. Whenever you land on a zero you must investigate the length of the zero sequence it belongs to see if you've located a bigger N. The advantage is that the longer the zero sequence you've already found (the bigger the N), the more characters you can skip in each iteration.
That's my suggestion. If it's the same as yours we've independenly found the same solution. If not, yours may be better but it's not the naive approach you should compare it with anymore, it's my suggestion.
I'm fairly sure my suggestion is a special case of Boyer Moore which is well analyzed and known to be very efficient. That's why I can be so sure my suggestion is very hard to beat. But one should never say never. :)
PS: The concurrent version of my suggestion is just to make it even faster. It doesn't change anything in principle.
Re: counting consecutive null characters
Well I did exactly the same thing you said. The algorithm is not particularly complicated. Less than the generalized Boyer-Moore.
I may give the multy thread a try. I'll tell you how it comes along. Apparently, VladimirF is working on an ECC implementation.
Re: counting consecutive null characters
Quote:
Originally Posted by
monarch_dodra
To be perfectly honest, I still haven't fully understood if you and Lindley suggested different things.
We're not, we're saying pretty much exactly the same thing, just looking at the problem from different directions. It's just that nuzzle is focusing on how to improve the best case, and I'm focusing on how bad the worst case would be. Both very important questions when characterizing an algorithm.
There is one type of iterator that won't work at all with the jump algorithm----input iterators. Since they can't be copied, you can't use the "save a copy" trick to go "backwards" the way you can with forward iterators.
Re: counting consecutive null characters
Quote:
Originally Posted by
Lindley
We're not, we're saying pretty much exactly the same thing, just looking at the problem from different directions. It's just that nuzzle is focusing on how to improve the best case, and I'm focusing on how bad the worst case would be. Both very important questions when characterizing an algorithm.
In the worst case the "jump algorithm" performs like the naive algorithm. The best case depends on luck. The sooner a very long zero sequence is found and the fewer zero sequences that have to be investigated, the better the algorithm will perform.
I consider the choise of iterator a side track. It has nothing to do with the performance of the "jump algorithm". Any traditional strictly sequential iterator will impair the algorithms ability to make the jumps efficiently. As I said before, it's like forcing array sort methods to use sequential iterators for array accesses and then claim that you're studying the worst case performance of these methods. You're not, you're just introducing an artificial obstacle.
Re: counting consecutive null characters
You're missing the point.
It's not a side track, it's a different track, equally relevant to the overall problem. The problem is not "how good is the jump algorithm", the problem is "how do we count consecutive zeros."
The jump algorithm is a good solution in some cases. There are other cases it doesn't apply to, and solving the consecutive 0s problem entirely (at least, using an STL-type algorithm interface) *requires* that we provide a means of handling them.
Clearly, the jump algorithm is a good option in the random access iterator case. Just as clearly, the naive algorithm or something close to it is required in the input iterator case.
The interesting question from my perspective becomes, what about intermediate cases? Forward iterators and bidirectional iterators could (and, judging by monarch's numbers, do) get some benefit from the jump algorithm even though they aren't ideally suited to it. Is that the best approach, or is there a better algorithm for those iterator types? That's the angle I'm still thinking about.
Re: counting consecutive null characters
Quote:
Originally Posted by
nuzzle
As I said before, it's like forcing array sort methods to use sequential iterators for array accesses and then claim that you're studying the worst case performance of these methods. You're not, you're just introducing an artificial obstacle.
Well the way I see it, an algorithm can either accept a forward iterator, or it can reject a forward iterator. If the algorithm accepts it, then it is the implementer's job to make sure he chose the best algorithm for said iterator type.
It is not incoherent to imagine that someone may want to know the longest sequence inside an slist. Why not find the best algorithm for that?
Quote:
Originally Posted by
Lindley
Forward iterators and bidirectional iterators could (and, judging by monarch's numbers, do) get some benefit from the jump algorithm
Actually, I didn't do forward iterators because the implementation was straying too far away from RA, and I did not feel like doing it.
I still believe that we can achieve good results using jump for forward iterators if we modify the jump:
If after a jump of N, you land on a 0. Say you start by counting forward 3 zeros. Now, you still have your original pre-jump forward iterator. If after iterating forward 3 times, you encounter a single non-zero, then that's it, the current chain is smaller than N. Skip to the next jump. Tons of compares saved!
Overall, you'd probably iterate a bit more than the normal algorithm, but also save that much more comparisons. Of course, the longer the input, the closer to 0 these numbers get.
I'll try to benchmark that tonight. After I finish on the multi thread RA.
Re: counting consecutive null characters
You mean iterating both the pre-jump and post-jump forward iterators alternately, until you can be sure you aren't looking at a new longest sequence, then leapfrogging them with another jump? Yeah, that would work.
Re: counting consecutive null characters
Quote:
Originally Posted by
Lindley
You mean iterating both the pre-jump and post-jump forward iterators alternately, until you can be sure you aren't looking at a new longest sequence, then leapfrogging them with another jump? Yeah, that would work.
Yes. Although not alternatively, just start with the post jump, and then do the pre-jump.