dcsimg
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 2 of 4 FirstFirst 1234 LastLast
Results 16 to 30 of 56

Thread: Implementing If-then-else without branching

  1. #16
    Join Date
    May 1999
    Location
    Southern California
    Posts
    12,266
    Originally posted by Kheun
    I am trying understand if there is any pratical value for doing that. Is there any reason to avoid branching? It seems to me that it only obsfucates the code. In my opinion, it is bad for production code, especially for future maintainence.
    It probably complies with a M$ standard of programming.
    "Signature":
    My web site is Simple Samples.
    C# Corner Editor

  2. #17
    Join Date
    Oct 2002
    Location
    Singapore
    Posts
    3,128
    Oh, I have heard that M$ always like to test interviewee's IQ. Probably this question has been designed for testing people's creativity in writing obsfucated code.

  3. #18
    Join Date
    Jun 2002
    Location
    Germany
    Posts
    1,557
    Thanks for this interesting post.

    If you look at my previous response, then I didn't even think it was possible. But enough shifts and nots and there you have it.

    As you can see, I always get stumped by these tongue twisters.

    And I want to trash on this kind of interview technique because I'm so burned out on it. I consider these interview questions to be silly. They are not related to future prospects pertaining to creativity, productivity and stress management of job candidates.

    We've got a lot of really good developers and development projects in our group and we do not focus on problems of this nature in daily development.

    The interviewer should be fired and replaced with someone who actually has an operational command of the sensible management of human resources.

    But a cute post anyway.

    Sincerely, Chris.

    Last edited by dude_1967; February 10th, 2004 at 03:28 AM.
    You're gonna go blind staring into that box all day.

  4. #19
    Join Date
    Apr 1999
    Location
    Altrincham, England
    Posts
    4,470
    I can imagine a valid reason for asking that question at an interview: the candidate who answers "why on earth would you want to do that?" gets the job.
    Correct is better than fast. Simple is better than complex. Clear is better than cute. Safe is better than insecure.
    --
    Sutter and Alexandrescu, C++ Coding Standards

    Programs must be written for people to read, and only incidentally for machines to execute.

    --
    Harold Abelson and Gerald Jay Sussman

    The cheapest, fastest and most reliable components of a computer system are those that aren't there.
    -- Gordon Bell


  5. #20
    Join Date
    Feb 2004
    Posts
    138
    I am sorry , I think it depends on the interviewer's viewpoint at the time he/she made this question and even his/her ability about computer programming can also be a possibility for the reason to the question why such a question comes in the discussion, I think it is still hard to comment on the interviewer or his/her question like that because the question was passed from the interviewee and no real circumstances that can help us to conclude quite strong in such a way.....
    And in case there is some implication about someone, I just wonder if a forgivenness is allowable to the mistakes already made which an excuse was already come right after that, since historically nothing that harmful and that bad as far as I know...

  6. #21
    Join Date
    Jun 2002
    Location
    Letchworth, UK
    Posts
    1,019
    Pity || and && are not allowed.

    if (a) b();

    can be coded as

    a && b();
    Succinct is verbose for terse

  7. #22
    Join Date
    Apr 1999
    Posts
    27,449
    Originally posted by Graham
    I can imagine a valid reason for asking that question at an interview: the candidate who answers "why on earth would you want to do that?" gets the job.
    It's a ridiculous interview question, plain and simple. It serves no practical purpose whatsoever. Branching is part of practically every programming language since the inception of computer programming. To ask on an interview question "not to branch" is stupid, non-sensical, and a waste of time.

    It's like asking a potential taxi driver on a job interview "what if you had no steering wheel in your car, how would you steer it?" instead of testing them on where various streets, roads, and sites are. It's plain stupid.

    This sounds like a question made up by the interviewer to stroke their own ego when no one answers it correctly.

    Regards,

    Paul McKenzie

  8. #23
    Join Date
    Feb 2004
    Posts
    138
    I am sorry for interupting the discussion, what i would like to say is that there are always possibilities to say anything about anyone or anything and I always would like to see the "harmony", the "balance" in thinking and judging people and thats all..
    That was not me...True !

  9. #24
    Join Date
    Jun 2003
    Location
    Armenia, Yerevan
    Posts
    712
    Perhaps they have some products that should be optimized very well, so dynamic branch prediction that works on at least Intel Pentium and higher processors, can cause some overheads (and if prediction is not taken then pipeline should be reloaded), but it's hard to say, which way is more optimal...
    Or perhaps they wanted to know whether interviewee is familiar with binary code, meaning of bitwise operations, or ability to have alternative solutions in some problems where other solutions (more suitable) can't work etc...
    Popular opinion is the greatest lie in the world.

  10. #25
    Join Date
    Apr 1999
    Posts
    27,449
    Originally posted by AvDav
    Perhaps they have some products that should be optimized very well, so dynamic branch prediction that works on at least Intel Pentium and higher processors, can cause some overheads (and if prediction is not taken then pipeline should be reloaded), but it's hard to say, which way is more optimal...
    If there is a well-known, canned answer as J0nas came up with, and the job is in an industry where this optimization is necessary, then maybe it is a valid question.

    However if the answer isn't well known, testing someone on bit manipulation should not have been done with such a question. There are far more fair and better ways to do it.

    Regards,

    Paul McKenzie

  11. #26
    Join Date
    Feb 2004
    Location
    Helsinki, Finland
    Posts
    31
    First post, be prepared

    I didn't consider the question to be stupid at all. Branches are the main cause as to why it is so difficult to make a CPU with higher IPC. I see many of you here are stuck to the old paradigm, where branches are something that can be used casually, without thinking. Partly this is still valid. In traditional CPUs, nowadays mostly used in embedded devices, branches are still the fastest way of doing things. It's only when you get to a CPU with pipeline depth equivalent to Pentium Pro architecture (10+ stages), branches become troublesome. With very deep 20+ -stage pipelines (P4 being the prime example), branches become absolutely critical to performance.

    Modern CPUs have multiple ALUs to potentially do many things in parallel, but the trick here is finding something for them to do. If you stick to in-order execution of instructions, you'll find that most of the instructions require previous results, so most of the time only a single pipeline can be kept full. This is circumvented by out-of order execution, speculatively executing instructions beyond the instruction pointer and searching for independent operations to be shoved onto the pipelines. By doing this we're eventually going to hit a branch (in x86, 1 out of 6 instructions on average). That doesn't seem like a problem, since the branch prediction unit will just make an educated guess and we'll keep on speculating along the predicted path. This can go on for many nested braches and as long as the predictions turn out to be correct everything is fine. But sooner or later we'll find out one of the guesses made along the way was incorrect. This means that A) all the results beyond the mispredicted brach are to be deleted and B) a penalty has to be paid for flushing the pipeline. This causes major havoc inside the CPU and many cycles are lost.

    The burden of smooth execution therefore lies mostly on the shoulders of the branch prediction unit. Modern implementations usually claim high 9x percentages for accuracy. This seems high enough until you realise that the figure includes loops and other highly predictable branches. For a random branch, there's only so much than can be predicted. Usually, static rules (backward branches taken, forward not taken etc.) apply before sufficient statistics have been stored to the Branch Target Buffer. After that, pattern recognition is used to find regularities within the data. It's easy to see that if a branch behaves randomly enough, there's no way to get more than 50% accuracy.

    It's not that there aren't ways to deal with excessive branches. For example, the x86 instruction set includes branchless conditional operations SETcc and CMOV, that could replace some simple branches like
    Code:
    if(a == 0) x = c;
    in machine code. The problem is that even these aren't always used by compilers for both reasons of compatibility and that most branches aren't quite as simple. There's also the question of how many extra instructions can we afford to execute per every condition evaluated for the benefit of never having a misprediction. Since the question closely involves branch predictability, it requires some human intuition. It's relatively easy to see the common case of a branch, whether it's going to be taken more often than not or whether there will be any recognisable patterns. The less predictable a branch, the more redundant things we can afford to execute, up to a point.
    Code:
    int Compare(int x, int y, int z)
    {
    	if(x == 0) return y;
    	else return z;
    }
    Now, let's take the example in question and see what it compiles to and how it might perform. We're using Visual C++ SP5 with standard release optimisations.
    Code:
    ; 86   : 	if ( x == 0 )
    
    	mov	eax, DWORD PTR _x$[esp-4]
    	test	eax, eax
    
    ; 87   : 		return y;
    
    	mov	eax, DWORD PTR _y$[esp-4]
    	je	SHORT $L7551
    
    ; 88   : 	else
    ; 89   : 		return z;
    
    	mov	eax, DWORD PTR _z$[esp-4]
    $L7551:
    
    ; 90   : }
    There are five serially dependent simple instructions plus a branch with unknown predictability. Simple instructions generally execute in one cycle, so we can roughly assume this would take five cycles if the branch is correctly predicted. In case is isn't, the flush penalty in PPro would be typically eight cycles and the flushed instructions would account conservatively for maybe another six cycles. Assuming worst case prediction accuracy, this would happen in every other case.

    So the total cycle count including penalties could be 5 + (8 + 6) / 2 = 12 cycles for PPro. This isn't terribly accurate, but useful as a rough estimate.

    Here's also a simpler alternate solution(sorry j0nas) for the branch-free implementation. The important thing to remember is that comparisons always produce a 1 (true) or 0 (false). This can be used like any other integer in arithmetic operations. Also, normal logical operations will prove useful. Boolean short circuit && and || would produce a branch, so they're not suitable.
    Code:
    int Compare(int x, int y, int z)
    {
    	return ((x != 0) - 1) & y | ((x == 0) - 1) & z;
    }
    We're assuming two's complement here, but a constant table lookup with items '0x0' and '~0x0' could be used in those rare cases the assumption isn't valid.

    The following assembly is produced:
    Code:
    ; 98   : 	return ((x != 0) - 1) & y | ((x == 0) - 1) & z;
    
    	mov	eax, DWORD PTR _x$[esp-4]
    	mov	edx, DWORD PTR _y$[esp-4]
    	xor	ecx, ecx
    	test	eax, eax
    	setne	cl
    	dec	ecx
    	and	ecx, edx
    	xor	edx, edx
    	test	eax, eax
    	mov	eax, DWORD PTR _z$[esp-4]
    	sete	dl
    	dec	edx
    	and	edx, eax
    	or	ecx, edx
    	mov	eax, ecx
    
    ; 99   : }
    There are fifteen simple instructions, of which some can be issued in parallel. This means the code block would take less than 15 cycles to execute on average. There aren't any branches and therefore no penalties. As an additional (small) benefit no BTB slots will be allocated.

    Is it beneficial to use a branchless solution here? The total cycle counts for the two implementations are roughly tied with a worst-case branch under PPro. The branchless solution is likely to be faster under P4 'Netburst' architecture so the forward-looking solution would be to use the latter (plus a commented-out branch, if the code is to be read by junior programmers).

  12. #27
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588
    Originally posted by Paul McKenzie
    It's a ridiculous interview question, plain and simple. It serves no practical purpose whatsoever.
    I sort of agree that it is a silly question. But it all depends on what the interviewer is expecting to hear. Potentially, you could answer:
    • Why would you want to do this anyways?
      Indicates that the candidate thinks on his own and has a practical way of approaching problems.
    • It's impossible
      This would indicate that you're either defeatist (meaning that when you don't see it right away, you don't investigate further) or doesn't know how integrated circuits work.
    • Well, I'll have to see... (and then trying different things)
      This is the more interesting response, since the interviewer will be able to see the approach to an (arbitrarily) difficult problem.
    • Give the correct answer
      To me, as an interviewer, this would indicate either that the person is extremely clever or that he has seen this type of question before. As such it's a bit of a useless answer.

    But well, you can draw up such lists for any old silly question like "What is your shoe-size"
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  13. #28
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588
    Originally posted by Synthetic Being
    Is it beneficial to use a branchless solution here? The total cycle counts for the two implementations are roughly tied with a worst-case branch under PPro. The branchless solution is likely to be faster under P4 'Netburst' architecture so the forward-looking solution would be to use the latter (plus a commented-out branch, if the code is to be read by junior programmers).
    Fair enough, but IMHO, the better solution still would be not to use this kind of statement in a time-sensitive loop, use a look-up table (with enough padding of other instructions) or use some other solution.

    If performance really comes down to this statement, then... maybe do this kind of optimization, but then you'll want to implement it in assembler anyways.

    [Edit: the difference is quite small, but does exist, as the following code executed on a P4 2000 shows:

    Result:
    Code:
    Time taken for non-jump:
    0.003406 seconds
    Time taken for jump:
    0.003332 seconds
    Code:
    Code:
    const int NUM_EXEC = 1000000;
    
    void test_func_1()
    {
      int k, x, y, z;
      x = 1;
      y = 0xFF1E;
      z = 0x172A;
      CPrecisionTimer ct;
      double res;
    
      ct.Start();
    
      for (int i = 0; i < NUM_EXEC; ++i) {
        k = ((x != 0) - 1) & y | ((x == 0) - 1) & z;
        // some other code, to let the processor do other things as well
        k += z;
        k -= y;
        z = (z | y) - (z | ~y);
      }
    
      x = 0;
      
      for (i = 0; i < NUM_EXEC; ++i) {
        k = ((x != 0) - 1) & y | ((x == 0) - 1) & z;
        // some other code, to let the processor do other things as well
        k += z;
        k -= y;
        z = (z | y) - (z | ~y);
      }
      
      res = ct.Stop();
      printf("Time taken for non-jump:\n%f seconds\n", res);
    }
    
    void test_func_2()
    {
      int k, x, y, z;
      x = 1;
      y = 0xFF1E;
      z = 0x172A;
      CPrecisionTimer ct;
      double res;
    
      ct.Start();
    
      for (int i = 0; i < NUM_EXEC; ++i) {
        k = ( x ? y : z);
        // some other code, to let the processor do other things as well
        k += z;
        k -= y;
        z = (z | y) - (z | ~y);
      }
    
      x = 0;
      
      for (i = 0; i < NUM_EXEC; ++i) {
        k = (x ? y : z);
        // some other code, to let the processor do other things as well
        k += z;
        k -= y;
        z = (z | y) - (z | ~y);
      }
      
      res = ct.Stop();
      printf("Time taken for jump:\n%f seconds\n", res);
    }
    
    int main()
    {
      test_func_1();
      test_func_2();
      getch();
      return 0;
    }
    Last edited by Yves M; February 10th, 2004 at 01:03 PM.
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  14. #29
    Join Date
    Feb 2004
    Location
    Helsinki, Finland
    Posts
    31
    Originally posted by Yves M
    Fair enough, but IMHO, the better solution still would be not to use this kind of statement in a time-sensitive loop, use a look-up table (with enough padding of other instructions) or use some other solution.
    Lookup tables are fine as long as you have some constants to look up for. Too many tables and you'll trash the cache though.
    Result:
    Code:
    Time taken for non-jump:
    0.003406 seconds
    Time taken for jump:
    0.003332 seconds
    It's nice when someone bothers to to do the gruntwork of verification. It seems the reference branches are extremely predictable (x being the same for an eternity) so I'm a little surprised the branchless solution isn't more behind. I wonder what the result might be with a totally unpredictable branch?

  15. #30
    Join Date
    May 1999
    Location
    Southern California
    Posts
    12,266
    Originally posted by Synthetic Being
    I didn't consider the question to be stupid at all. Branches are the main cause as to why it is so difficult to make a CPU with higher IPC. I see many of you here are stuck to the old paradigm, where branches are something that can be used casually, without thinking.
    I have not re-read this thread after reading your response, but I sure don't remember anyone saying that branching is bad. The original question (supposedly as asked by the interviewer) asked not to use branching but that is the only place I remember where branching was perhaps suggested as bad.

    IPC? This discussion has such remote relevance to interprocess communication as to make IPC irrelevant.

    Since you used such strong words, I will reply correspondingly. I notice that sometimes people try harder to explain to others what it is that the others don't understand and then blame them for not listening. Often the truth is that the person trying to explain needs to listen and understand. I think you are trying to show off your knowledge of low-level processor design but you did not spend much time understanding this discussion.
    "Signature":
    My web site is Simple Samples.
    C# Corner Editor

Page 2 of 4 FirstFirst 1234 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width




On-Demand Webinars (sponsored)