CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 4 of 4 FirstFirst 1234
Results 46 to 53 of 53
  1. #46
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,626

    Re: IF-free conception.

    Quote Originally Posted by S@rK0Y View Post
    Superbonzo, at most, Optimization doesn't need a genius programmer
    This is your first and probably biggest mistake, because it DOES need a genius programmer, or rather, genius has a wrong conotation, "expert" is a better word.


    99.9% of programmers out there have not received any formal training whatsoever in optimization at all.

    Some have, but it rarely goes beyond "try a few different approaches and pick the best one" (or rather, pick the one that somehow worked best for the chosen datasets, it could really be the worst of the options when dealing with live data.

    THere's only a very few programmers that can look at a bunch of code, or look at a profiler output and see what's wrong and see how to tackle the problem.

    And I'm talkign just "code" here. There's actually a bigger amount of optimisation experts when it comes to specific programming related fields such as optimizing/improving SQL queries and improving network related issues, and improving the GPU (3D) pipeline. I don't eally catalogue them as general code optimizers per se, what they do isn't really improving the interaction between their app and 3rd party api's by tailoring the code to suit the needs of the 3rd party API.

    modern compilers are very poor things to perform true HPC.
    that's because that's not the job of an (optimizing) compiler. You expect the compiler to produce a bunch an object/binary in an acceptable amount of time and that will run the program with a reasonable amount of performance.

    Under that ruling, compilers have become incredibly good at what they do. But in the end, they produce 'general purpose code', just like most libraries provide 'general purpose containers and routines'.
    SPecialising Always makes code go faster, but compilers and libraries can't afford to go that route, unless you have compilers that are specialized at a particular task (and those do exist). There are compilers out there that will produce facinating blindingly fast code for math problems, but which won't do well on other problem domains.


    No magic, Just good Asming does do it for
    and you will never know unless you write your fsort WITH if. Compile both with optimizations turned on. and run them side by side on the same computer with the same datasets.
    I highly doubt removign a couple if's will produce improvements in the same order as the fsort/qsort comparison (which is comparing apples vs oranges).

    microoptimizing doesn't provide that kind of percentage improvement and 'handholding the compiler' definately doesn't.

  2. #47
    Join Date
    Feb 2013
    Posts
    58

    Re: IF-free conception.

    Some have, but it rarely goes beyond "try a few different approaches and pick the best one" (or rather, pick the one that somehow worked best for the chosen datasets, it could really be the worst of the options when dealing with live data.
    OReubens, performance of any algo is:

    1. data-dependent.
    2. hardware-dependent.
    ---------------------------------------
    in true HPC, we can't rely Just upon some kind of math abstraction.
    that's because that's not the job of an (optimizing) compiler. You expect the compiler to produce a bunch an object/binary in an acceptable amount of time and that will run the program with a reasonable amount of performance.

    Under that ruling, compilers have become incredibly good at what they do. But in the end, they produce 'general purpose code', just like most libraries provide 'general purpose containers and routines'.
    SPecialising Always makes code go faster, but compilers and libraries can't afford to go that route, unless you have compilers that are specialized at a particular task (and those do exist). There are compilers out there that will produce facinating blindingly fast code for math problems, but which won't do well on other problem domains.
    that's exactly what i've been saying. it takes purpose-devoted hardware to provide fast-running compilers, capable to gen truly efficient output.

    and you will never know unless you write your fsort WITH if. Compile both with optimizations turned on. and run them side by side on the same computer with the same datasets.
    I highly doubt removign a couple if's will produce improvements in the same order as the fsort/qsort comparison (which is comparing apples vs oranges).

    microoptimizing doesn't provide that kind of percentage improvement and 'handholding the compiler' definately doesn't.
    as i said before, if-based version would be good too, but it can't beat 'if-reduced' up. let's see simple example:

    Code:
    static int compare (const void * a, const void * b){
        if (*(const double*)a < *(const double*)b) return 1;
        else if (*(const double*)a > *(const double*)b) return -1;
        else return 0;  
    }
    now output compiled w/ -o3:
    Code:
    Dump of assembler code for function compare(void const*, void const*):
       0x0000000000406df6 <+0>:     push   %rbp
       0x0000000000406df7 <+1>:     mov    %rsp,%rbp
       0x0000000000406dfa <+4>:     mov    %rdi,-0x8(%rbp)
       0x0000000000406dfe <+8>:     mov    %rsi,-0x10(%rbp)
       0x0000000000406e02 <+12>:    mov    -0x8(%rbp),%rax
       0x0000000000406e06 <+16>:    movsd  (%rax),%xmm1
       0x0000000000406e0a <+20>:    mov    -0x10(%rbp),%rax
       0x0000000000406e0e <+24>:    movsd  (%rax),%xmm0
       0x0000000000406e12 <+28>:    ucomisd %xmm1,%xmm0
       0x0000000000406e16 <+32>:    jbe    0x406e1f <compare(void const*, void const*)+41>
       0x0000000000406e18 <+34>:    mov    $0x1,%eax
       0x0000000000406e1d <+39>:    jmp    0x406e41 <compare(void const*, void const*)+75>
       0x0000000000406e1f <+41>:    mov    -0x8(%rbp),%rax
       0x0000000000406e23 <+45>:    movsd  (%rax),%xmm0
       0x0000000000406e27 <+49>:    mov    -0x10(%rbp),%rax
       0x0000000000406e2b <+53>:    movsd  (%rax),%xmm1
       0x0000000000406e2f <+57>:    ucomisd %xmm1,%xmm0
       0x0000000000406e33 <+61>:    jbe    0x406e3c <compare(void const*, void const*)+70>
       0x0000000000406e35 <+63>:    mov    $0xffffffff,%eax
       0x0000000000406e3a <+68>:    jmp    0x406e41 <compare(void const*, void const*)+75>
       0x0000000000406e3c <+70>:    mov    $0x0,%eax
       0x0000000000406e41 <+75>:    pop    %rbp
       0x0000000000406e42 <+76>:    retq   
    End of assembler dump.
    now -o0:
    Code:
    Dump of assembler code for function compare(void const*, void const*):                                                                                                                                                                       
       0x0000000000406df6 <+0>:     push   %rbp
       0x0000000000406df7 <+1>:     mov    %rsp,%rbp
       0x0000000000406dfa <+4>:     mov    %rdi,-0x8(%rbp)
       0x0000000000406dfe <+8>:     mov    %rsi,-0x10(%rbp)
       0x0000000000406e02 <+12>:    mov    -0x8(%rbp),%rax
       0x0000000000406e06 <+16>:    movsd  (%rax),%xmm1
       0x0000000000406e0a <+20>:    mov    -0x10(%rbp),%rax
       0x0000000000406e0e <+24>:    movsd  (%rax),%xmm0
       0x0000000000406e12 <+28>:    ucomisd %xmm1,%xmm0
       0x0000000000406e16 <+32>:    jbe    0x406e1f <compare(void const*, void const*)+41>
       0x0000000000406e18 <+34>:    mov    $0x1,%eax
       0x0000000000406e1d <+39>:    jmp    0x406e41 <compare(void const*, void const*)+75>
       0x0000000000406e1f <+41>:    mov    -0x8(%rbp),%rax
       0x0000000000406e23 <+45>:    movsd  (%rax),%xmm0
       0x0000000000406e27 <+49>:    mov    -0x10(%rbp),%rax
       0x0000000000406e2b <+53>:    movsd  (%rax),%xmm1
       0x0000000000406e2f <+57>:    ucomisd %xmm1,%xmm0
       0x0000000000406e33 <+61>:    jbe    0x406e3c <compare(void const*, void const*)+70>
       0x0000000000406e35 <+63>:    mov    $0xffffffff,%eax
       0x0000000000406e3a <+68>:    jmp    0x406e41 <compare(void const*, void const*)+75>
       0x0000000000406e3c <+70>:    mov    $0x0,%eax
       0x0000000000406e41 <+75>:    pop    %rbp
       0x0000000000406e42 <+76>:    retq   
    End of assembler dump.
    do you look any differences? the're a lot of data-flow between memory & registers. could you shed a light where'd compiler use a 3OE approaches to boost code or data-flow reduction as well??????

  3. #48
    2kaud's Avatar
    2kaud is offline Super Moderator Power Poster
    Join Date
    Dec 2012
    Location
    England
    Posts
    7,822

    Re: IF-free conception.

    and your equivalent if-reduced c code is and the output assembler code is?
    All advice is offered in good faith only. All my code is tested (unless stated explicitly otherwise) with the latest version of Microsoft Visual Studio (using the supported features of the latest standard) and is offered as examples only - not as production quality. I cannot offer advice regarding any other c/c++ compiler/IDE or incompatibilities with VS. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/ and can be used without reference or acknowledgement. Also note that I only provide advice and guidance via the forums - and not via private messages!

    C++23 Compiler: Microsoft VS2022 (17.6.5)

  4. #49
    Join Date
    Feb 2013
    Posts
    58

    Re: IF-free conception.

    2kaud, actually, i put 'pivot' & some of other vars in xmmXX & mmX -- quite useful to reduce data flow. In fact, loops must be so light as possible. by the way, these lines:
    Code:
       0x0000000000406e0e <+24>:    movsd  (%rax),%xmm0
       0x0000000000406e12 <+28>:    ucomisd %xmm1,%xmm0
    better off to change on:
    Code:
     ucomisd %xmm1,(%rax)

  5. #50
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,626

    Re: IF-free conception.

    Quote Originally Posted by S@rK0Y View Post
    better off to change on:
    Code:
     ucomisd %xmm1,(%rax)
    Actually... NO.
    the compiler generated version is better.

    you're assuming fewer lines of assembly (fewer instructions) is better. or that whichever solution uses the least clockcycles according to the instruction timings is the fastest. This is simply not the case.
    Code:
    movsd  (%rax),%xmm0
    ucomisd %xmm1,%rax
    causes stalls.
    1) execution of the 2nd instruction cannot proceed until the first one has completed
    2) it also needs to commit the change over to the other ALU.
    This means you have 2 clockcycles where both ALU's are in a stall condition
    meaning you're effectively loosing 3 or 4 clockcycles (actually it's 5 because of a quirck in xmm access).

    Still convinced you can hand-microoptimize better than the compiler ?

    using %xmm0 again while seemingly slower, actually performs better because it doesn't cause a stall. (and actually the timing on the CPU is wrong because that includes the extra timer to get the FPU to complete the operation which you don't have to wait for. the next instruction will start before the ucomisd has completed (unless it uses the result of course)




    also the code you posted is horribad. I don't know which compiler generated that but you either used very bad settings or your're using a compiler that was made in the previous century. (well into the previous century).
    It's obviously 64bit code, so there's no reason to force it to use anything other than __fastcall

    If I try the above in VC2010, wiht proper optimization settings that compare results in:
    Code:
             movsdx	 xmm0, QWORD PTR [rcx]
             comisd	 xmm0, QWORD PTR [rdx]
             jbe	               comparebe
             or	               eax, -1
             ret	               0
    
    comparebe:
             xor	               eax, eax
             ret	               0
    That's about as optimal as you can make it, even with hand optimizing. There's a sort of oddball trick you can pull to make the comparebe branch a notch faster (at the expense of making the not-comparebe a notch slower) which would remove the conditional jump. There may be a few cases where that may pay off, but in general the difference won't matter.
    Again, the compiler doesn't do this because it's "general purpose" and assumes the first compare you wrote in C is the more important one.

    Also note that the compiler is smart enough to not do 2 compares. It's also smart enough to see that it doesn't need a stackframe and it uses the incoming values directly from the registers using the __fastcall convention (which is the norm for win64)

    Now tell me again you're not convinced that compilers do a good job at optimizing. I can guarantee you that of the few programmers that know assembly, only a minute amount would have written anything resembling the above.

  6. #51
    Join Date
    Feb 2013
    Posts
    58

    Re: IF-free conception.

    Code:
      movsdx	 xmm0, QWORD PTR [rcx]
             comisd	 xmm0, QWORD PTR [rdx]
             jbe	               comparebe
             or	               eax, -1
             ret	               0
    
    comparebe:
             xor	               eax, eax
             ret	               0
    OReubens, would ye like to post full listing of output? this part seems to me as standard 3OE approach. And it doesn't contradict w/ my words about:
    Code:
     ucomisd (%rax), %xmm1

  7. #52
    Join Date
    Jul 2013
    Posts
    576

    Re: IF-free conception.

    Quote Originally Posted by dglienna View Post
    They'll be laughing at this thread in 50 years, I'd bet
    Why will they be laughing?

    Do you mean,

    1. they have the same discussion then because the issue still hasn't been resolved, or
    2. the issue has been resolved and one side has won?

    In the second case would you please inform us who the future winner is and why?
    Last edited by razzle; August 8th, 2014 at 03:18 PM.

  8. #53
    Join Date
    Jan 2006
    Location
    Fox Lake, IL
    Posts
    15,007

    Re: IF-free conception.

    Sorry. My crystal ball is broken...
    David

    CodeGuru Article: Bound Controls are Evil-VB6
    2013 Samples: MS CODE Samples

    CodeGuru Reviewer
    2006 Dell CSP
    2006, 2007 & 2008 MVP Visual Basic
    If your question has been answered satisfactorily, and it has been helpful, then, please, Rate this Post!

Page 4 of 4 FirstFirst 1234

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured