CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 3 of 5 FirstFirst 12345 LastLast
Results 31 to 45 of 68
  1. #31
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: counting consecutive null characters

    Quote Originally Posted by Lindley View Post
    I suppose for forward iterators you're stuck looking at everything anyway, so you may as well do a simpler algorithm. Bidirectional would depend on the relative speed of conditionals versus traversal. It's a good optimization for random access iterators though.
    That's exactly how binary_search works. it does just as many iterations, but a fraction of the checks.

    I was hoping to rely on distance and advance to write a common algorithm for both, but it creates a lot of problems. For instance, it is very hard to know if you have reached the end of your boundaries if you try to jump forward.

  2. #32
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: counting consecutive null characters

    Well, I had my fun.

    I won't bore you with everything I did, but the conclusion is that the jump algorithm applied to bidirectional iterators proved some good performance.

    After much tweaking, I got some good result for RA, BD, and forward iterators.

    My test bench was a pre-filled char array of 8096 chars randomly distributed between '0' and '1'. I do the search 100 000 times. There are 13 consecutive 0.

    My times:
    Standard array:
    Jump algorithm (RA): 0.73 seconds
    Standard algo: 3.1 seconds

    List:
    Jump algorithm (BD): 2.2 seconds
    Standard algo: 3.5 seconds

    EDIT: mingw -O3 on a 3.0Ghz Phenom 9950

    For those interested, here is my code:

    Code:
    template<typename iterator, typename T>
    typename std::iterator_traits<iterator>::difference_type
    findLongestSequence(iterator first, iterator last, T value)
    {
        return findLongestSequenceImplementation(
            first,
            last,
            value,
            typename std::iterator_traits<iterator>::iterator_category()
        );
    }
    
    template<typename iterator, typename T>
    typename std::iterator_traits<iterator>::difference_type
    findLongestSequenceImplementation(iterator first, iterator last, T value, std::random_access_iterator_tag)
    {
        iterator middle = first;
    
        //quick check of first sequence, to never check bounds when iterating backwards.
        while(middle!=last && *middle==value)
        {
            ++middle;
        }
    
        //fast initialization
        typename std::iterator_traits<iterator>::difference_type maxCount = std::distance(first, middle); //potentially 0
        typename std::iterator_traits<iterator>::difference_type step = maxCount+1;
    
        ++middle;
        while(middle<last) //This check makes things difficult for bidirectional iterator
        {
            if (*middle == value)
            {
                iterator low = middle;
    
                //No need to check bounds here
                while(*--low == value);
                ++low;
    
                while((++middle != last) && (*middle == value));
    
                //since the if is wrong 90% of the time, evaluating distance twice is faster
                if(maxCount < std::distance(low, middle))
                {
                    maxCount = std::distance(low, middle);
                    step = maxCount+1;
                }
            }
            std::advance(middle, step);
        }
        return maxCount;
    }
    
    template<typename iterator, typename T>
    typename std::iterator_traits<iterator>::difference_type
    findLongestSequenceImplementation(iterator first, iterator last, T value, std::bidirectional_iterator_tag)
    {
        typename std::iterator_traits<iterator>::difference_type maxCount = 0;
        iterator middle = first;
    
        //quick check of first sequence, to never check bounds when iterating backwards.
        while(middle!=last && *middle==value)
        {
            ++middle;
            ++maxCount;
        }
        if(middle==last) return maxCount;
    
        typename std::iterator_traits<iterator>::difference_type step = maxCount+1;
    
        while(1)
        {
            if (*middle == value)
            {
                iterator low = middle;
                typename std::iterator_traits<iterator>::difference_type aDistance = 1;
    
                //No need to check bounds here
                while(*--low == value)
                {++aDistance;}
    
                while((++middle != last) && (*middle == value))
                {++aDistance;}
    
                if(maxCount < aDistance)
                {
                    maxCount = aDistance;
                    step = maxCount+1;
                }
            }
    
            for(typename std::iterator_traits<iterator>::difference_type i = 0; i!=step; ++middle, ++i)
            {
                if (middle==last) return maxCount;
            }
        }
    }
    
    
    template<typename iterator, typename T>
    typename std::iterator_traits<iterator>::difference_type
    findLongestSequenceImplementation(iterator first, iterator last, T value, std::forward_iterator_tag)
    {
        typename std::iterator_traits<iterator>::difference_type curCount=0;
        typename std::iterator_traits<iterator>::difference_type maxCount=0;
        while(first != last)
        {
            //search for the first occurence of value
            if (*first++ == value)
            {
                //get length of current occurence
                curCount = 1;
                while((first != last) && (*first++ == value))
                {
                    ++curCount;
                }
    
                //test max only once the current chain is counted
                maxCount = std::max(maxCount, curCount);
            }
        }
        return maxCount;
    }
    Code:
    int main()
    {
        char a[]= //see below
    
        std::list<char> aList(a, a+sizeof(a));
    
        clock_t begTime = clock();
        for(int i=0; i<100000; ++i)
        {
            findLongestSequence(a, a+sizeof(a), '0');
            //findLongestSequence(aList.begin(), aList.end(), '0');
        }
        std::cout << clock()-begTime << std::endl;
    }
    And here is my char array

    Code:
    "11101000101110111001010111001111110101010000101110011011000100100010100111000110"
    "11001111111110011010011101010100010001110101101101000111110111000101010111110010"
    "00001001000000011111000100110101001010110110001100001111001111110111001101011111"
    "10001001001100100000111101011011001100010000010101011111001000111111000101001100"
    "11010000000111111101010011011001001001101000001111111000101000110100100010111001"
    "11011000001110000111000101001001101100011010001111010011111111101011000100011100"
    "00001100110011000001010000011110000010010001100011110101101010010011001100110111"
    "00100101111010110010101000100001100011101010110001010011000010000011010110101010"
    "01001100011110110111110010100000111111110111110111100011000010110010001011000000"
    "11100010001010011001110110100011101111111101111100000100011110100101110001101011"
    "10101001010110110111011011111011110110111011110011011101101100010101000001011011"
    "10001100010010010100101100011010010101000110011100000101010110010000000100011100"
    "11011010010111011011101101111000111110000111000011010010101001100000001010011010"
    "01101000101101011100011111101111011101100100010100000011000000110010000010000010"
    "10001101110101001010001000100001010101010111100010110001111011010010110100100000"
    "00010101110010100000000111000011111010001100110101101111111110110110110001101110"
    "00000101100111011110011000111111010101111000011100100100110100101110101100011111"
    "00110010111001111011110000101111010011110110100001010010111100010011110110110011"
    "11101011000001000100011011110100111000011011111001010110111010100000111000101100"
    "11000001000011010000101001111101000000100111011000111001101100111101011011000100"
    "00011010100000000110100111010110011110111110110011011000100110010101010010010011"
    "10011101011110111101000101111000000001100111001100011000001001010001110110010010"
    "10010011101001110100010001100100000011000001111010101011101110001011000110001111"
    "10010010011010001110001000111000000011101110000001001010011100110111010111010010"
    "01011011011101111010100111010101000111011101100011010100110001111110000000111001"
    "01001110101001111110000001101100001111111011110000101011010111101000111100101010"
    "11001111110011111000000110111111000110010010011111010100100101011000111110110001"
    "01011100101111100101110100110100011001000001011100100111110001001110111011011001"
    "00010001100001100011000010000100011001011001101010010010110111010011111110111101"
    "00000011111001000100010111111011110100011000100111101001101110111000011100111100"
    "10111000001010101010001000011111010110000101010011000000110011101110100010100000"
    "10101010010011101110100100011010101000111111000100010100000100010011110011010110"
    "00110000110110011110001001101100001101011101011000010011000101110001010001000100"
    "10011000000010010110000101101101000011110100011001011111100101110110101001000000"
    "11001000100110001111100111101000000100010000011111111000110000001001001101000010"
    "10000001001111100011011010011011010101101011001010011011101001100110000100110000"
    "11101100010110011110001101111010010001001011011010000000101011100101000111110101"
    "01001111100010100001011011111100011010011111110011101010101110011010111010111101"
    "11000010110011011100110001010111001100010111101001011010011110011101000100001010"
    "01100100101001000001110111000011001100010101111110010110000111011011110111101101"
    "01010110100101001000001101000100001111001100100000101011000000110001101000100000"
    "11010101000110110111111000011110011001111000011000111111000110001111101111011001"
    "01101101011111100011011000110010111001011111111001101010001011111011001111001010"
    "01010100100001000101000000010010001000111101010011100000111000100101100100010101"
    "11100011001111110101110000101010001111011101010111011010111011100011100000010001"
    "01010111100101110010011101110110111001100000110100011100011000000100111100110000"
    "01100010000000011111101001111100010011110111001010010111100110101100101010110110"
    "00001100100000110000101111000011000011110111111110001000011000100011010010100001"
    "00111110111100111011000101001101100110010100001010000100100010111111111011111101"
    "00010101000001010101110110010010010001110011110101001011100100011001101000010001"
    "00001000000110101000111000101100111110101101010001110001010011101100110110101001"
    "00111001010000000011001110010111101111011100001111011001001011101001111001111011"
    "00000110110010101100011001100100100111000101110101001000000101100100110011010001"
    "01111010110010010011010000101011110111100011110000000010010010000010001001100001"
    "01111110011100101000010001011000001101000101010000011100000010110111011100011110"
    "11001011011001001111111101010110100110011101000110101001110011101011010111011110"
    "01000110101100101101011000000111110111100100011110101100100010101111101111110011"
    "10111001000100000010001111110101001011111011000100110110001101001010100011000100"
    "00101101011110100000100111110010000011111100110110100010111000100110101010100111"
    "00011100101101110100000101010110011100111010010111010010011100111110111000010101"
    "00001001111011101110111100100100100000101010110011010011110100100110100101010111"
    "11100100010101101000000111000110010110011100010011110110101100011111000010011110"
    "01111111011100001011001001100110010001110001010001111111011010110010100000011011"
    "01111011011100111111000001110000100101000000001111010101001011110111110010011010"
    "01101111100101010100010110011110110110111110011001101011010010110000111000101100"
    "00101111101111111000000011111011101011110000111001001000011101011001110110110011"
    "00100110000011110011100111100111000011001100110111011111011100100111010011110001"
    "01110001011000110110100011110100111010010001000001111000101110100010100111111111"
    "11110101001100100100110010001010100000010011000101001000011000010100111010010111"
    "00101000001111001101011000111011011010011111100011111100110100110000100011001111"
    "00100011000100101100010000100010011001001000001001001101010010100111010010011111"
    "11111100111010010000010110001111010101101111110101110100101001101110010110100000"
    "01101011100111101010010001110110110110000101011100011011010011100000010101001110"
    "00110010101111101101001010001001001011110100101001100011000000100000010110000100"
    "10001110101001111110111000110110111111000001000000000001110011000110011101101101"
    "01011110110110000111010001111100100001001011010000110010111001111001000111001000"
    "11111101101101000110000110011001110111110100111010000011111000010110010101100001"
    "10100000001001011000000011000011101011100001000000010100000010001101100010100001"
    "10000010100111100111001011010001011100100001100101101000001010000010101010001011"
    "11000100111111110001111010101001001110001010100100001000110101011011110010101000"
    "01101010100000010001010011101111011110001111011111011100100001001110010110110111"
    "10100011001100111001001100010101101111110011001001110101010110111100110111100010"
    "11000100101111111001011001100011111001100011011011101110111111000011100110011110"
    "00110001010110000110110011000010100110100011000101010110100111110001111000110011"
    "00011101001010111011000100000110011111000100001000000000010110100111101110111011"
    "01000101011011001101011010110111110101101011010100011111000010101101010111000011"
    "11111100010011011101110000010001011010111010011011111011101110111001010000001011"
    "00001011001011011110010000110100111011100110011101101101110010011010110110010011"
    "01111100110110000110000110110101001110101100001101100100100000101110111110010101"
    "10010110111000101100111111100010011001110111111001111100010001000101011101110111"
    "10101000001001110101100110001011000101101100100100001101010100101001100000011010"
    "11000110111000100111100010011010110101000111111100110000111011100011011000001100"
    "00010110000010011100100010010111000000100011000101001011100111100111000000000101"
    "11110011101101010110001001010111001110010001110011101110110100110101011101100101"
    "10011111110101010111101010101010011111110111101100011100110110011100111010011110"
    "01110011001111001110111000110000001100110010011000110011111110101100010011000111"
    "01110111011001110111101000101010010110100111101111101011010001000001001101001110"
    "00000000000111010110111001101110111101110100000101110001000111011011101010010011"
    "11000110101110100110000110101011000101111010111000100000000000001110010001101101"
    "00111001111011100111000110001110000111100110001010010100000011111111001111110010"
    "11100011111100000001101100111000011000111010110101001000011010000111011000011101"
    "10010110010011110011100000000110100100010011010111100101001100110111111101110101"
    "11000101101001101001010000101110";
    Last edited by monarch_dodra; April 20th, 2010 at 03:04 AM.

  3. #33
    Join Date
    Aug 2005
    Location
    San Diego, CA
    Posts
    1,054

    Question Re: counting consecutive null characters

    Quote Originally Posted by superbonzo View Post
    taken by a sudden spirit of rebellion I reread the paragraph on for_each and it (implicitly) allows side effects on its functor argument... mah...

    I feel your pain. I wonder why the C++ standard is so vague about this. I mean why else would for_each return a functor if not to allow the functor to accumulate data that can be retrieved after the algorithm finishes? Yet the std does not explicitly state anything about that. It reads,
    Notes: If f returns a result, the result is ignored.
    What does that mean? Ignored by who or what?

  4. #34
    Lindley is offline Elite Member Power Poster
    Join Date
    Oct 2007
    Location
    Seattle, WA
    Posts
    10,895

    Re: counting consecutive null characters

    Well, keep in mind that you can always use the accumulate() algorithm if you need stateful behavior; it's allowed there.

  5. #35
    Join Date
    Aug 2005
    Location
    San Diego, CA
    Posts
    1,054

    Lightbulb Re: counting consecutive null characters

    Quote Originally Posted by Lindley View Post
    Well, keep in mind that you can always use the accumulate() algorithm if you need stateful behavior; it's allowed there.

    I understand but it simply sums values with operator+. for_each appears to support far more complex behavior. What if I wanted to analyze a range of objects and build some metrics? Isn't that one of the reasons that for_each exists for? That is what I would have thought by looking at the implementations that I currently see but the standard is pretty vague and scott meyers in effective c++ writes as though he is rather perplexed about for_each as well. It shouldn't be confusing at all. The std should just tell us what the algorithm is for and what its restrictions are very concisely.

  6. #36
    Join Date
    Nov 2008
    Location
    England
    Posts
    748

    Re: counting consecutive null characters

    Josuttis shows exactly that use of the for_each return value in his book. His example shows taking the mean of a sequence with a stateful functor and for_each.
    Get Microsoft Visual C++ Express here or CodeBlocks here.
    Get STLFilt here to radically improve error messages when using the STL.
    Get these two can't live without C++ libraries, BOOST here and Loki here.
    Check your code with the Comeau Compiler and FlexeLint for standards compliance and some subtle errors.
    Always use [code] code tags [/code] to make code legible and preserve indentation.
    Do not ask for help writing destructive software such as viruses, gamehacks, keyloggers and the suchlike.

  7. #37
    Lindley is offline Elite Member Power Poster
    Join Date
    Oct 2007
    Location
    Seattle, WA
    Posts
    10,895

    Re: counting consecutive null characters

    Quote Originally Posted by kempofighter View Post
    I understand but it simply sums values with operator+. for_each appears to support far more complex behavior.
    Actually, accumulate() supports an arbitrary functor argument. std:lus is merely the default.

    However, I actually got it backwards: according to Effective STL, for_each() is the one which is allowed to have a stateful functor; accumulate() is not. The two are similar otherwise.
    Last edited by Lindley; April 15th, 2010 at 10:40 PM.

  8. #38
    Join Date
    Jul 2002
    Location
    Portsmouth. United Kingdom
    Posts
    2,727

    Re: counting consecutive null characters

    Quote Originally Posted by Lindley View Post
    Hmm, except that if you *do* find a null on one of those jumps, you'll need to look backward to find the start of the sequence. Not a problem for bidirectional iterators, but for forward iterators it would be.
    There's no reason why you maybe couldn't have multiple forward iterators to simulate the look back.
    "It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong."
    Richard P. Feynman

  9. #39
    Join Date
    Jul 2002
    Location
    Portsmouth. United Kingdom
    Posts
    2,727

    Re: counting consecutive null characters

    Quote Originally Posted by Lindley View Post
    However, I actually got it backwards
    Further evidence that the algorithm limitations need to be better defined or, better still, remove as many limitations as possible and make them more compatible with stateful functors & predicates. This seems to be how most people expect them to work.

    I read an article a while ago by one of the people on the C++ committee discussing the STL. Apparently the committee was mainly made up of compiler and library writers. Maybe a few coders who work at the application sharp end would give them a better grounding in what is really useful.
    "It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong."
    Richard P. Feynman

  10. #40
    Join Date
    Oct 2008
    Posts
    1,456

    Re: counting consecutive null characters

    Quote Originally Posted by JohnW@Wessex View Post
    Further evidence that the algorithm limitations need to be better defined or, better still, remove as many limitations as possible and make them more compatible with stateful functors & predicates. This seems to be how most people expect them to work.

    I read an article a while ago by one of the people on the C++ committee discussing the STL. Apparently the committee was mainly made up of compiler and library writers. Maybe a few coders who work at the application sharp end would give them a better grounding in what is really useful.
    On the other hand it's worth noting that the no-side effects requirement in the accumulate algorithm took me to the overloaded operator+ solution which better fits the semantics of std::accumulate... that is a good thing ( of course, you could say that it was my fault from the beginning ).

  11. #41
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: counting consecutive null characters

    Quote Originally Posted by Lindley View Post
    Actually, accumulate() supports an arbitrary functor argument. std:lus is merely the default.

    However, I actually got it backwards: according to Effective STL, for_each() is the one which is allowed to have a stateful functor; accumulate() is not. The two are similar otherwise.
    I told you (well, superbonzo actually) that it was dangerous and confusing lol ! Any chance you could enlighten us quickly as to why that is? What makes the two so different?

    Quote Originally Posted by JohnW@Wessex View Post
    There's no reason why you maybe couldn't have multiple forward iterators to simulate the look back.
    That's true, but for this algorithm, it would kind of defeat the purpose. No point "jumping ahead" if you are just going to iterate through everything just to see where the range starts later. Well, you might save a few comparisons, but nothing drastic.
    Last edited by monarch_dodra; April 16th, 2010 at 03:58 AM.

  12. #42
    Lindley is offline Elite Member Power Poster
    Join Date
    Oct 2007
    Location
    Seattle, WA
    Posts
    10,895

    Re: counting consecutive null characters

    Quote Originally Posted by monarch_dodra View Post
    Any chance you could enlighten us quickly as to why that is? What makes the two so different?
    I'm just repeating what I read in Effective STL. Even Scott Meyers shrugs and dodges the question regarding why they're different.

  13. #43
    Join Date
    Aug 2000
    Location
    New York, NY, USA
    Posts
    5,656

    Re: counting consecutive null characters

    Quote Originally Posted by monarch_dodra View Post
    My test bench was a pre-filled char array of 8096 chars randomly distributed between '0' and '1'. I do the search 100 000 times. There are 13 consecutive 0.

    My times:
    Standard array:
    Jump algorithm (RA): 0.73 seconds
    Standard algo: 3.1 seconds
    1. This is very impressive!
    2. I couldn't beat it with my ideas.
    3. Wasted some time on a fight with Visual Studio's optimization.
    4. I can't rate this post at the moment. Will have to spread some stuff around first. Is this some kind of redistribution of wealth?
    5. I suspect that overhead from my ideas overweight the benefits for such a short longest sequence of 13.
    6. Have some promising results from SSE. Needs some more work.
    Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
    Convenience and productivity tools for Microsoft Visual Studio:
    FeinWindows - replacement windows manager for Visual Studio, and more...

  14. #44
    Join Date
    May 2009
    Posts
    2,413

    Re: counting consecutive null characters

    Quote Originally Posted by Lindley View Post
    I suppose for forward iterators you're stuck looking at everything anyway, so you may as well do a simpler algorithm. Bidirectional would depend on the relative speed of conditionals versus traversal. It's a good optimization for random access iterators though.
    I'm not sure you understand the beauty of the algorithm I suggested. If you did why are you so preoccupied with the choise of iterators? It's just a technicality.

    My suggestion, which now seems to be called the "jump algorithm" in this thread, increases the performance of the native approach (where every byte is checked) many times. But I can't take much credit for it because as laserlight noted, it's probably just a special case of the Boyer-Moore algorithm.

    http://en.wikipedia.org/wiki/Boyer%E...arch_algorithm

  15. #45
    Lindley is offline Elite Member Power Poster
    Join Date
    Oct 2007
    Location
    Seattle, WA
    Posts
    10,895

    Re: counting consecutive null characters

    It's not a "choice" of iterators. The coder writing the algorithm does not necessarily control what type of iterator they're going to get when the algorithm is called. They could put a restriction on that, but the better option is to fall back to a simpler, less efficient algorithm if the iterator type provided doesn't work with the fast approach.

    The proposed "jump" optimization can be applied best to random-access iterators, and for those I'm sure it works fantastically. I was simply speculating on how well it would extend to the other types, if at all.

    And yes, I did grasp why it's such a good idea.
    Last edited by Lindley; April 19th, 2010 at 11:23 PM.

Page 3 of 5 FirstFirst 12345 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured