CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 1 of 4 1234 LastLast
Results 1 to 15 of 53
  1. #1
    Join Date
    Aug 2017
    Posts
    31

    Question Is there anyway to make this C-code faster?

    Many thanks!


    BYTE* B;
    int* C;
    int N

    …..
    ……
    ……

    BYTE Result = 0;
    for ( int i=0; i<N; i++ )
    Result |= B[C[i]];

  2. #2
    Join Date
    Aug 2017
    Posts
    31

    Re: Is there anyway to make this C-code faster?

    BYTE* B; array of {0,1} only
    int* C; array of indicies
    int N ( is about [100,1000]

    …..
    ……
    ……

    BYTE Result = 0;
    for ( int i=0; i<N; i++ ) Result |= B[C[i]];

    Result is 0,1 obviously. In my experience any "if" statement inside the loop
    vastly slows down the process (for my typical N ~1000)
    Last edited by mkh01; August 31st, 2017 at 05:42 PM.

  3. #3
    Join Date
    Feb 2017
    Posts
    677

    Re: Is there anyway to make this C-code faster?

    You could split the loop into several parts and run them in parallel. Afterwards you or together the partial results from all runs to get the final result. With a quad processor it should be at least 3 times faster.

    When the result becomes hex FF the loop can be exited because nothing can change that result. If this happens fairly often and fairly early during looping then it may be worth testing for this condition and so shorten the total looping on average.

    If C[i] always accesses every element of B[] then C[i] isn't necessary and can be replaced by just i (because the order of or doesn't matter).
    Last edited by wolle; August 31st, 2017 at 05:21 PM.

  4. #4
    Join Date
    Aug 2017
    Posts
    31

    Re: Is there anyway to make this C-code faster?

    Thank you.
    0. You suggestion would probably make sense for very large Ns, say >100K

    1. For my scenario N is in [100,1000], so the thread overhead is prohibitive

    2. Also I have a parallelization on a higher level that calls this routine with all 32-threads
    i have.

  5. #5
    Join Date
    Feb 2017
    Posts
    677

    Re: Is there anyway to make this C-code faster?

    What about the GPU?

  6. #6
    Join Date
    Aug 2017
    Posts
    31

    Re: Is there anyway to make this C-code faster?

    1. Adding any kind of conditional "if" statement is vastly slowing down the execution in this scenario (like slower x4).
    This might be processor-specific (i7-5930K 6-core) in my case.

    2. Important detail: B can be only {0,1}

  7. #7
    Join Date
    Feb 2017
    Posts
    677

    Re: Is there anyway to make this C-code faster?

    Quote Originally Posted by mkh01 View Post
    B can be only {0,1}
    Branch-free code is faster but an optimizing compiler should be able to coerce the branch-prediction circuit into assuming the if statement will be false until the one-off event that it isn't.

    Code:
    BYTE Result = 0;
    for (int i=0; i<N; i++) {
    	Result = B[C[i]];
    	if (Result != 0) break; // instruction pipeline fault should happen once only
    }
    Last edited by wolle; September 1st, 2017 at 01:59 AM.

  8. #8
    Join Date
    Oct 2008
    Posts
    1,456

    Re: Is there anyway to make this C-code faster?

    first of all, I’d follow wolle’s advice of exploiting ‘or’ properties, and adapt your data structures accordingly, eg. B and C seems a very inefficient choice ( how are B and C created/updated ? ); in fact, the reason why you observe a big slow down when adding an ‘if’ is not probably the branch per se ( that’s easily predictable ) but the excessive cache hits if B is big and C indices spread out in memory.
    If this is the case, it may be even worth it to sort C before doing what wolle suggested in post #7…

  9. #9
    2kaud's Avatar
    2kaud is online now Super Moderator Power Poster
    Join Date
    Dec 2012
    Location
    England
    Posts
    7,824

    Re: Is there anyway to make this C-code faster?

    Without changing data structures as suggested by superbonzo in post #8, (which would probably provide the biggest efficiency gain), you could try

    Code:
    	for (int *cp = C, *ce = C + N; cp != ce && !Result; Result = B[*cp++]);
    In previous tests, pointer usage is more efficient than using array element access.

    PS Why do you need to access B via C? All you seem to be interested in is whether B contains at least one 1 - so why not just iterate over B until a 1 is found? How big is B? If you are using higher-level parallelization, is that splitting B into the various 'chunks' accessed via C?
    Last edited by 2kaud; September 1st, 2017 at 04:38 AM. Reason: PS
    All advice is offered in good faith only. All my code is tested (unless stated explicitly otherwise) with the latest version of Microsoft Visual Studio (using the supported features of the latest standard) and is offered as examples only - not as production quality. I cannot offer advice regarding any other c/c++ compiler/IDE or incompatibilities with VS. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/ and can be used without reference or acknowledgement. Also note that I only provide advice and guidance via the forums - and not via private messages!

    C++23 Compiler: Microsoft VS2022 (17.6.5)

  10. #10
    Join Date
    Aug 2017
    Posts
    31

    Re: Is there anyway to make this C-code faster?

    My sincere thanks.

    I tried your code and it runs about 15% slower in my scenario.

  11. #11
    Join Date
    Aug 2017
    Posts
    31

    Re: Is there anyway to make this C-code faster?

    Thanks so much for the code snippet.
    I inserted and profiled it. Unfortunately it runs about 2 times slower than the original code.
    I am using VC++ 2015 with all the optimization flags

  12. #12
    Join Date
    Aug 2017
    Posts
    31

    Re: Is there anyway to make this C-code faster?

    >>why not just iterate over B until a 1 is found

    Because i am only interested in selected indices - and that info is in C.

    Overall I am very impressed by other members desire to help out.
    Perhaps, I should also post the entire procedure, which is
    a "Greedy Algorithm for Minimal Independent Sets on Random Graphs"

    The graphs I was profiling this were N=500 with edge density 0.5

  13. #13
    2kaud's Avatar
    2kaud is online now Super Moderator Power Poster
    Join Date
    Dec 2012
    Location
    England
    Posts
    7,824

    Re: Is there anyway to make this C-code faster?

    Quote Originally Posted by mkh01 View Post
    1. Adding any kind of conditional "if" statement is vastly slowing down the execution in this scenario (like slower x4).
    This might be processor-specific (i7-5930K 6-core) in my case.
    Yeah, that's what your comments in posts #10 & #11 bear out. Interesting. Iterating the full loop with an |= is quicker than adding a test and terminating the loop earlier. Hmmmm. Do you have any analysis as the distribution of the '1' in B based upon C index? The loops tried iterate through c from start to end. Is there any advantage to iterating C from end to start?

    Is there some simple test code that could be used to test different scenarios with the loop? How big is the B array?
    All advice is offered in good faith only. All my code is tested (unless stated explicitly otherwise) with the latest version of Microsoft Visual Studio (using the supported features of the latest standard) and is offered as examples only - not as production quality. I cannot offer advice regarding any other c/c++ compiler/IDE or incompatibilities with VS. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/ and can be used without reference or acknowledgement. Also note that I only provide advice and guidance via the forums - and not via private messages!

    C++23 Compiler: Microsoft VS2022 (17.6.5)

  14. #14
    Join Date
    Aug 2017
    Posts
    31

    Re: Is there anyway to make this C-code faster?

    Thanks so much again for the continuing discussion.

    // Do you have any analysis as the distribution of the '1' in B based upon C index?
    B is just a row in a Graph adjacency matrix. I work with graph anywhere N=[500,5000]
    Also graphs are of different density d = [0.1,0.99], but I am testing it now for d=0.5
    So you can assume B is filled with random about equal amount of {0,1}

    I would not mind to switch to a bit-representation of the adjacency matrix (8 time smaller, better cache?)
    But as far as i remember it was still a slowdown due to code for accumulating bits.

    //The loops tried iterate through c from start to end. Is there any advantage to iterating C from end to start?

    I will try that.


    //Is there some simple test code that could be used to test different scenarios with the loop? How big is the B array?[/QUOTE]

    I am very impressed with the feedback and will try to post a complete procedure for the Independent Sets, probably 30 or so lines of C code.
    But then there should be an extra code for the random graph generation.

  15. #15
    Join Date
    Oct 2008
    Posts
    1,456

    Re: Is there anyway to make this C-code faster?

    Quote Originally Posted by mkh01 View Post
    Also graphs are of different density d = [0.1,0.99], but I am testing it now for d=0.5
    So you can assume B is filled with random about equal amount of {0,1}
    this means you’d need just 2 cycles on avarage to get a result as by post #7 given a random C with no repetitions; it’s highly unplausible for such a loop to run slower than a full C traversal. So, unless C has a very strange pattern, there must be some other reason for the slow down, for example, it may be a case of false sharing ( do all the 32 threads access the same B ? )

    anyway, you should really post some minimal code reproducing your timings in order for us to help …

Page 1 of 4 1234 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured