CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 4 of 4 FirstFirst 1234
Results 46 to 56 of 56
  1. #46
    Join Date
    Feb 2004
    Location
    Helsinki, Finland
    Posts
    31

    Re: Implementing If-then-else without branching

    Now that this old thread has been lifted, I might add some fresh comments on the issue.

    There are many problems related to application performance. Branches are a relatively minor one when compared to improvements that can be had by using better algorithms, optimising memory usage patterns, vectorisation etc. Most of the time applying these techniques isn't even necessary. However, sometimes inner loops spend most of their time doing simple conditional operations. This is when branch elimination becomes truly useful in performance-critical applications.

    In order to benefit those rare occasions, I've come up with an overanalysed solution, a very fast but not the most versatile high-level branch replacement code for mainstream CPU architectures.

    Now, let's go through some of the other solutions first. Those have been converted to inline functions resembling the x ? a : b operator, but obviously being more restricted in their application. This is done in the interest of finding the best performing solution, not following the restrictions of the original assignment, except by coincidence.

    Code:
    inline const int conditOperA(const int compare, const int a, const int b) {
    	return (!compare - 1) & a | (compare - 1) & b;
    }
    I'm calling this replacement Type A. The compare given as first parameter is converted to an integer with a binary value, which is then used to generate a mask for selecting the correct result before combining them. This is mostly identical to my previous attempt. The actual compare is now done only once due to it being passed as a parameter, which may have removed a small bottleneck.

    Code:
    inline const int conditOperB(const int compare, const int a, const int b) {
    	return ((compare - 1) & (b ^ a)) ^ a;
    }
    This is Type B, a slightly more obfuscated version of Type A. Intel's recommendation is more or less identical to this one, but done with ADD/SUB instead of XOR. I see a variation of this one has already been presented in this thread.

    Certainly, if manufacturer's documentation recommends a certain course of action, that has to be very close to the optimal solution, right? Still, going through the quoted performance of certain instructions in the (in)famous Netburst architecture got me thinking. If all the instructions which are useful for branch replacement are taken under observation, we can see some interesting details:
    Code:
    Architecture	Pentium Pro and most other x86		NetBurst
    
    Instruction	Latency (cycles)			Latency (cycles)
    sbb		1					6
    cmovcc		1					5
    setcc		1					5
    sar		1					4 (Prescott: 1)
    We notice that flag using instructions are generally slow in P4. These instructions are used to find out the result of a comparison. While the pipelined throughput even for these instructions is still 1/1 or close enough, the latencies are problematic if they can't be hidden. This isn't always achieved in practice. The best solution should thus logically be free of these instructions. If a binary decision modified the MSB of an integer, the SAR instruction could be used to generate a mask.

    Code:
    const unsigned int INT_BITS = CHAR_BIT * sizeof(int);
    
    inline const int conditOperC(const int compare, const int a, const int b) {
    	return ((compare >> (INT_BITS - 1)) & (b ^ a)) ^ a;
    }
    This function, Type C, differs from the previous ones in that it interprets negative as false and everything else as true. This is required in order to avoid comparison instructions. The way it makes bit masking possible is based on the fact that compilers use arithmetic shift for signed integers, which duplicates the sign for emerging bits. Negative numbers produce a row of 1's for the AND-mask while everything is set as 0 for positive numbers. Generally, mixing shifts and signed numbers isn't a good idea unless you know what you're doing.

    Comparing two numbers without actually doing a compare is a bit tricky. Here are a few usage examples including the conditional operator and its branchless counterpart:

    Code:
    int x, y;		// Preferably of type int
    int a, b, result;	// Type can vary as long as the bit count is at most equal
    			// to that of the first parameter in the function
    ...
    result = (x < 0) ? a : b;		result = conditOperC(x, b, a);
    result = (x > 0) ? a : b;		result = conditOperC(-x, b, a);
    result = (x >= 0) ? a : b;		result = conditOperC(x, a, b);
    result = (x <= 0) ? a : b;		result = conditOperC(-x, a, b);
    
    result = (x >= y) ? a : b;		result = conditOperC(x - y, a, b);	// [1]
    result = (x > y) ? a : b;		result = conditOperC(y - x, b, a);	// [1]
    result = (x == y) ? a : b;		result = conditOperC(-abs(x - y), a, b);
    
    // [1] Limitation: sign(x) == sign(y) or x == 0 or y == 0
    With some pointer trickery from this thread, the code could be used to compare, say, floats. But clearly there are limitations to when it can be used.

    The test code has been included as an attachment. I've included both a very easy branch and a very hard-to-predict branch. There are also separate test cases for Type C unequality (C1) and equality (C2), since the latter is more complicated to implement. The switch..case required for test selection without duplicating code compiles to a jump table, so its effect on performance should be negligible.

    I've done some test results for CPUs I had available. Like before, the application is compiled with VC++ 6.

    Code:
    486DX4/75:
    branch (easy)		54.2 seconds
    branch (hard)		52.4 seconds
    A			65.7 seconds
    B			65.8 seconds
    C1 (relative size)	55.2 seconds
    C2 (equality)		62.2 seconds
    Here, we actually see the "hard" branch outperforming the "easy" branch. The 486 uses a hard-wired prediction of "branch not taken", when the opposite is always true in the first case. In the second branch, this prediction is right about 50% of the time. Since the difference is so small due to the short pipeline, such a simplistic approach to branching is justified. Clearly, no branch replacements are necessary in this class of CPUs. All the alternatives are slower, but C1 comes fairly close in speed.

    Code:
    K6-2/333:
    branch (easy)		7.18 seconds
    branch (hard)		7.78 seconds
    A			9.76 seconds
    B			9.44 seconds
    C1 (relative size)	7 seconds
    C2 (equality)		8.22 seconds
    The K6 also has a rather short pipeline and is relatively immune to branch predictability. However, branch replacement C1 manages to slightly outperform even a best case branch. No other replacement performs well enough to be recommended.

    Code:
    PIII/700:
    branch (easy)     		2.2 seconds
    branch (hard)     		2.85 seconds
    A                 		2.2 seconds
    B                 		2.2 seconds
    C1 (relative size) 		2.05 seconds
    C2 (equality)      		2.37 seconds
    The P6 core takes a significant hit with a mispredicted branch, making all replacement options viable. Again, we see C1 outperforming the rest of the field making it the most recommended option.

    Code:
    P4 "Northwood"/3.0:
    branch (easy)			0.433 seconds
    branch (hard)     		0.826 seconds
    A                 		0.525 seconds
    B                 		0.513 seconds
    C1 (relative size) 		0.429 seconds
    C2 (equality)      		0.573 seconds
    Here we see the pretty much the same pattern as above, except that the penalty for a poorly predictable branch becomes so overwhelming as to be clearly avoided if at all possible.

    In conclusion of these results, Type C1 is clearly the fastest branch replacement when it can be used, surprisingly so even in architectures other than Netburst. In other occasions, I'd choose between Type B and a branch, depending on its predictability.
    Attached Files Attached Files

  2. #47
    Join Date
    Apr 2003
    Location
    Los Angeles area
    Posts
    776

    Re: Implementing If-then-else without branching

    Something else to consider. I was curious about this and later found mention somewhere that there are undocumented (or obscurely documented) instructions for the P4 to control the branch prediction unit in the CPU.
    "The Chicken and Rice MRE is not a personal lubricant."

  3. #48
    Join Date
    Jun 2004
    Location
    Kerala, India
    Posts
    4

    Smile Re: Implementing If-then-else without branching

    Quote Originally Posted by AvDav
    Code:
    int compare(int x, int y, int z)
    {
    	int a[]={z, y};
    	return a[x==0];
    }
    Have fun...
    There is an error.

    Comparison operator return zero if comparison fails and return non zero if comparison success. Non zero may be 1, 6 -6 etc.. . We cannot predict that it is 1 when comparison success.

    Then in the above code if x is not zero then a[z==0] may become a[-15] or a[7] etc. Then the output may be an error. Is it?

  4. #49
    Join Date
    Jun 2004
    Location
    Kerala, India
    Posts
    4

    Re: Implementing If-then-else without branching

    Quote Originally Posted by Hitesh1903
    This is how the question was put to me.

    An if-then-else when written in assembly will generate a jump operation. Write the if-then-else clause so that you don't have a jump statement. And the question was followed with what is given on the first post.

    Now just for an example:

    Code:
    if x, y being two integers
    
    if ( x == 0 )
         y = 0;
    else
         y = 1;
    
    can be written simply as
    
    y = !!x;
    This was another one on the test.
    U can write code like this

    int res = !x * y;
    res += !!x * z;
    return res;

  5. #50
    Join Date
    Feb 2004
    Location
    Helsinki, Finland
    Posts
    31

    Re: Implementing If-then-else without branching

    Quote Originally Posted by albinbaby
    There is an error.

    Comparison operator return zero if comparison fails and return non zero if comparison success. Non zero may be 1, 6 -6 etc.. . We cannot predict that it is 1 when comparison success.
    No, there isn't. The return type from relational and equality operators is bool, which says in the C++ standard. Only an uninitialised bool could have values other than 0 or 1.

  6. #51
    Join Date
    Jun 2004
    Location
    Kerala, India
    Posts
    4

    Re: Implementing If-then-else without branching

    Quote Originally Posted by Synthetic Being
    No, there isn't. The return type from relational and equality operators is bool, which says in the C++ standard. Only an uninitialised bool could have values other than 0 or 1.
    You are right.

    But in many compilers previous to the publication of the ANSI-C++ standard, as well as in the C language, the relational operations did not return a bool value true or false, rather they returned an int as result with a value of 0 in order to represent "false" and a value different from 0 (generally 1) to represent "true".

    See http://www.cplusplus.com/doc/tutorial/tut1-3.html


  7. #52
    Join Date
    Mar 2003
    Location
    India {Mumbai};
    Posts
    3,871

    Re: Implementing If-then-else without branching

    Quote Originally Posted by albinbaby
    You are right.
    No, Synthetic is perfectly right.
    In C the result of a condition is either true or false. Here 'true' and 'false' are just conceptual materials and not specific to language. The result is guaranteed to be 0 for false, and 1 for true. As you know the condition can be true (white) or false (black) and nothing else (no GRAY exists!). So, it is needless to mention non-false (true) as non-ONE -- it is always 1.
    My latest article: Explicating the new C++ standard (C++0x)

    Do rate the posts you find useful.

  8. #53
    Join Date
    Feb 2004
    Location
    Helsinki, Finland
    Posts
    31

    Re: Implementing If-then-else without branching

    I think the numeric values of 'true' and 'false' weren't in doubt but rather if those values were actually returned. In standard C++, this is obviously the case. In C99 the return type is defined as int (0 or 1), which is rather unambiguous as well. IMO, we shouldn't let the implementation of new code be hampered by pre-standard compilers using uncommon conventions. Like albinbaby said, different from 0 generally ment 1. Which is likely why it became standardised in the first place.

  9. #54
    Join Date
    Jun 2004
    Location
    Kerala, India
    Posts
    4

    Re: Implementing If-then-else without branching

    Quote Originally Posted by Ajay Vijay
    No, Synthetic is perfectly right.
    In C the result of a condition is either true or false. Here 'true' and 'false' are just conceptual materials and not specific to language. The result is guaranteed to be 0 for false, and 1 for true. As you know the condition can be true (white) or false (black) and nothing else (no GRAY exists!). So, it is needless to mention non-false (true) as non-ONE -- it is always 1.
    Thanks Ajay.
    But I was confused by the word in following sites.
    Please have a look at them and clarify
    Thanks in advance
    Albin

    "In C, ``false'' is represented by a value of 0 (zero), and ``true'' is represented by any value that is nonzero. Since there are many nonzero values (at least 65,534, for values of type int), when we have to pick a specific value for ``true,'' we'll pick 1. "

    From http://www.eskimo.com/~scs/cclass/notes/sx3c.html

    "C's concept of ‘true’ and ‘false’ boils down to simply ‘non-zero’ and ‘zero’, respectively. Where we have seen expressions involving relational operators used to control do and if statements, we have just been using expressions with numeric results. If the expression evaluates to non-zero, then the result is effectively true. If the reverse is the case"

    From http://publications.gbdirect.co.uk/c...ask_ahead.html

    "In many compilers previous to the publication of the ANSI-C++ standard, as well as in the C language, the relational operations did not return a bool value true or false, rather they returned an int as result with a value of 0 in order to represent "false" and a value different from 0 (generally 1) to represent "true". "

    From http://www.cplusplus.com/doc/tutorial/tut1-3.html



    And more over C does not provide a standard Boolean type,
    See http://www.eskimo.com/~scs/C-faq/q9.1.html

  10. #55
    Join Date
    Mar 2003
    Location
    India {Mumbai};
    Posts
    3,871

    Re: Implementing If-then-else without branching

    I looked at all links you provided, but none of them clearly says that false is NOT-ZERO and true is NOT-ONE. I dont see any compiler or langauge specification that voilates true=1 and false=0.
    Further, it is correct that non-initialization of 'bool' (primitive in language, NOT typedef, #defined or enumerated) may lead it to value which is neither zero nor one -- reason is simple, bool uses one byte (8-bits) and not one bit! And such that the result of unknown bits may lead to unpredictable non-initialized value.

    All in all, result of conditional and/or relational operator will be 1 or 0 only and NOTHING ELSE.
    My latest article: Explicating the new C++ standard (C++0x)

    Do rate the posts you find useful.

  11. #56
    Join Date
    Mar 2003
    Location
    India {Mumbai};
    Posts
    3,871

    Re: Implementing If-then-else without branching

    Quote Originally Posted by eskimo.com
    Since there are many nonzero values (at least 65,534, for values of type int), when we have to pick a specific value for ``true,'' we'll pick 1. "
    It is similar to sizeof of a class/struct which does not have any data-members.
    My latest article: Explicating the new C++ standard (C++0x)

    Do rate the posts you find useful.

Page 4 of 4 FirstFirst 1234

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured