Re: counting consecutive null characters
taken by a sudden spirit of rebellion I reread the paragraph on for_each and it (implicitly) allows side effects on its functor argument... mah... :confused:
therefore the following code should be standard correct
Code:
int main()
{
char data[] = {0, 1, 2, 3, 0, 0, 4 ,5 ,0, 0, 0, 6, 7};
int count = 0;
int largest = 0;
std::for_each( data, data + 13, [&]( char value ) { if( value ) count = 0; else largest = std::max( largest, ++count ); } );
}
a presumably correct alternative with accumulate would be
Code:
int main()
{
char data[] = {0, 1, 2, 3, 0, 0, 4 ,5 ,0, 0, 0, 6, 7};
struct MyCounter
{
MyCounter(): value(0), count(0) {}
MyCounter( int value, int count ): value(value), count(count) {}
MyCounter operator+( char a_char )
{
if( a_char ) count = 0; else ++count;
return MyCounter( std::max( value, count ), count );
}
int value;
int count;
};
MyCounter count = std::accumulate( data, data + 13, MyCounter() );
}
Re: counting consecutive null characters
Why all the hoopla for such simple task:
Code:
int main()
{
#define BUFSZ 13
int CurRun=0,MaxRun=0;
unsigned char buf[BUFSZ]={0,1,2,3,0,0,4,5,0,0,0,6,7};
for(size_t x=0;x<BUFSZ;x++){
buf[x]?(CurRun>MaxRun?MaxRun=CurRun:0,CurRun=0):CurRun++;
}
printf("Max: %d\n",MaxRun);
return 0;
}
(off the top of my head. It could probably be reduced even more).
Re: counting consecutive null characters
Quote:
Originally Posted by hoxsiew
Why all the hoopla for such simple task:
That looks like roughly the same algorithm as what JohnW@Wessex proposed in post #5, except that your implementation is more obfuscated. The posts in between have been attempts to answer (and discuss answers to) the original question by proposing things in the standard library beyond std::string and std::max.
Re: counting consecutive null characters
Quote:
Originally Posted by
aryan1
Performance is my first priority while implementing this task.
Looks like this thread departed from stated “first priority”.
There are some obvious algorithmic optimizations to this process.
For example, in JohnW@Wessex’s code when you find a beginning of sequence of zeroes, you can first check if an element at *(begin + maxCount) position is 0; if it is – keep checking for all zeroes, but if it isn’t – you can adjust your current position to (begin + maxCount) skipping maxCount elements.
Then, for larger maxCount, you could replace byte-by-byte compare with 4-bytes int compare to 0 (taking care of alignment, of course). Depending on expected maxCount, this could be very beneficial.
If the performance's priority is really high, we could keep going…
Re: counting consecutive null characters
Quote:
Originally Posted by
VladimirF
Looks like this thread departed from stated “first priority”.
There are some obvious algorithmic optimizations to this process.
For example, in JohnW@Wessex’s code when you find a beginning of sequence of zeroes, you can first check if an element at *(begin + maxCount) position is 0; if it is – keep checking for all zeroes, but if it isn’t – you can adjust your current position to (begin + maxCount) skipping maxCount elements.
Then, for larger maxCount, you could replace byte-by-byte compare with 4-bytes int compare to 0 (taking care of alignment, of course). Depending on expected maxCount, this could be very beneficial.
If the performance's priority is really high, we could keep going…
A binary search implementation of the "find the longest chain of a item" algorithm. Sounds like a challenge!
Re: counting consecutive null characters
Quote:
Originally Posted by
aryan1
what I want is to obtain the size of the "largest" sequence of consecutive NULL bytes,
Isn't an optimization possible here?
Once you've found a null sequence of length N you only have to check whether every N+1'th byte is null after that (but when you do find a null you need to look both backwards and forwards to establish the length of that particular null sequence). Still the longer the null sequence you already have the faster it becomes to check the rest of the bytes because you can increase the stride.
Re: counting consecutive null characters
Quote:
Originally Posted by
JohnW@Wessex
That looks a bit over complicated. How about...
Code:
#include <string>
#include <algorithm>
int checkConNullBytes(const std::string &dataStream)
{
int maxCount = 0;
int count = 0;
std::string::const_iterator begin = dataStream.begin();
std::string::const_iterator end = dataStream.end();
while (begin != end)
{
if (*begin == 0)
{
++count;
maxCount = std::max(maxCount, count);
}
else
{
count = 0;
}
++begin;
}
return maxCount;
}
int main()
{
char data[] = {0, 1, 2, 3, 0, 0, 4 ,5 ,0, 0, 0, 6, 7};
std::string dataStream(data, data + 13);
int largest = checkConNullBytes(dataStream);
}
If you expect the string to have mostly 0s, you should put the std::max call in the else rather than the if (prior to resetting count). If you expect the 0s to be fairly rare, it'll be faster as-is.
Re: counting consecutive null characters
Quote:
Originally Posted by nuzzle
Once you've found a null sequence of size N you only have to check whether every N+1'th byte is null after that (but when you find a null you need to look both backwards and forwards to establish the size of that particular null sequence). Still the larger the null sequence you already have the faster it becomes to check the rest of the bytes because you can increase the stride.
That has the taste of Boyer-Moore :)
Re: counting consecutive null characters
Quote:
Originally Posted by
monarch_dodra
A binary search implementation of the "find the longest chain of a item" algorithm. Sounds like a challenge!
I keep hinting to a programming challenge for awhile, with no support. Maybe to subtle?
If we only had sizeable data file with bunch of zeroes in it for benchmark… Would anybody accept such challenge?
Re: counting consecutive null characters
Quote:
Originally Posted by
nuzzle
Isn't an optimization possible here?
Once you've found a null sequence of length N you only have to check whether every N+1'th byte is null after that (but when you find a null you need to look both backwards and forwards to establish the length of that particular null sequence). Still the longer the null sequence you already have the faster it becomes to check the rest of the bytes because you can increase the stride.
I am trying to find the difference between your suggestion an my post # 19 above. Don’t see it :)
Re: counting consecutive null characters
Quote:
Originally Posted by
VladimirF
I keep hinting to a programming challenge for awhile, with no support. Maybe to subtle?
If we only had sizeable data file with bunch of zeroes in it for benchmark… Would anybody accept such challenge?
Why not? I'm already doing it anyways.
Actually, I'm implementing a different solution for each iterators type. You wouldn't want to skip too far ahead if you have input iterators...
Re: counting consecutive null characters
Quote:
Originally Posted by
VladimirF
I am trying to find the difference between your suggestion an my post # 19 above. Don’t see it :)
There were so many replies I just browsed them before I replied. But now when I look at your suggestion I can see it's based on the same idea.
One difference thought seems to be that you make a longer jump forward only after you've found a zero. In my version you switch to N+1 increments for the remainder of the search when you've found a sequence of length N. If you have found a null sequence of 10 bytes you only need to check every 11'th byte ever after.
Re: counting consecutive null characters
Hmm, except that if you *do* find a null on one of those jumps, you'll need to look backward to find the start of the sequence. Not a problem for bidirectional iterators, but for forward iterators it would be.
Re: counting consecutive null characters
Quote:
Originally Posted by
Lindley
Hmm, except that if you *do* find a null on one of those jumps, you'll need to look backward to find the start of the sequence.
Yes I mentioned that in #21.
Quote:
Not a problem for bidirectional iterators, but for forward iterators it would be.
With a little effort you can make almost everything not work.
Re: counting consecutive null characters
I suppose for forward iterators you're stuck looking at everything anyway, so you may as well do a simpler algorithm. Bidirectional would depend on the relative speed of conditionals versus traversal. It's a good optimization for random access iterators though.