CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 2 of 4 FirstFirst 1234 LastLast
Results 16 to 30 of 53
  1. #16
    Join Date
    Feb 2013
    Posts
    58

    Re: IF-free conception.

    Razzle, we can dispute up to Infinity & Beyond i'd like to see real benchmarks, you could run fsort(no-if) onto your intel-based machine(s) & only this can be the last word for such discussions.

    P.S.

    please, switch off optimizations in compiler. i did use gcc4.8 & opensuse 13.1 x64.

  2. #17
    Join Date
    Jul 2013
    Posts
    576

    Re: IF-free conception.

    Quote Originally Posted by S@rK0Y View Post
    we can dispute up to Infinity & Beyond i'd like to see real benchmarks,
    If you find a sort() to be a performance bottlenck then of course locate and use the fastest replacement you can lay your hands on. But this isn't what we've been discussing. You claim programmers should routinely perform hardware specific microoptimizations throughout and in my view that's extremely counterproductive behavior. The code becomes messier and often even slower.

    Besides sort algorithms belong to the systems software. It's not something an application programmer write from scratch (apart from naive newbies). The same goes for all other algorithms you normally find in standard libraries. There are very fast and efficient replacement libraries available written by experts. Just benchmark your fsort(no-if) against the sort implementations you find in libraries from nVidia and Intel and I can promise you a big surprise.

    Finally the correct approach of an application programmer faced with a sorting induced performance bottleneck is not to look for faster sorting algorithms. It is to look for ways to replace the sorting altogether with a lower complexity alternative. It could even mean a major re-design but so be it.

  3. #18
    Join Date
    Feb 2013
    Posts
    58

    Re: IF-free conception.

    Besides sort algorithms belong to the systems software. It's not something an application programmer write from scratch (apart from naive newbies). The same goes for all other algorithms you normally find in standard libraries. There are very fast and efficient replacement libraries available written by experts. Just benchmark your fsort(no-if) against the sort implementations you find in libraries from nVidia and Intel and I can promise you a big surprise.
    Razzle, IT has no magic pills. even if to say about sorting algos, the're a lot of 'minors':

    1. data structures (array has low performance at the rest of possible operations).
    2. number of items to sort.
    3. type of algo's implementation (cpu-based or gpu-based or hardware built-ins).
    3.1. cpu-based ones could be MT'ed or not.
    --------------------------------
    it's easy to see that specifically tuned things have the highest performance.

    if to be saying of fsort(no-if), the're two hardware-independent features -- they make this algo amongst superior ones, at long & very long runs, even w/o intensive Microcoding. actually, all-that Asming has been needed to improve the speed on relatively short runs. From the very inception, i've been pursuing the very goal to develop purely QS-based sorting w/ no needs an inssort or whatever else for small subsets. + this algo can be quite easily MT'ed. In other words, it's the very strong Beast & can be improved/adapted by many ways
    If you find a sort() to be a performance bottlenck then of course locate and use the fastest replacement you can lay your hands on. But this isn't what we've been discussing. You claim programmers should routinely perform hardware specific microoptimizations throughout and in my view that's extremely counterproductive behavior. The code becomes messier and often even slower.
    here gets curious overview:
    Until recently software developers have relied on the increasing capacity of hardware to compensate for increased software inefficiency, but the benefits of the so-called Moore’s Dividend are running out [3]. While the number of transistors on a chip is still doubling every two years, the gain in number of transistors can no longer be used to increase individual processor performance due to insufficient instruction level parallelism in a program and a chips power dissipation limit. Instead, the gain is being used to increase the number of processors on a chip. Therefore, unless the application itself is highly parallel in nature the potential performance improvement from increased hardware capacity has reached its limit. To accommodate future computing requirements it is necessary that accidental growth be minimized. Hence, software efficiency is becoming more important.

    Current software development practices (particularly reliance on shared libraries, elaborate class hierarchies, and layers of function calls), while increasing programmer productivity, also lead to software bloat and inefficient program execution. Existing compiler technology generally includes optimization options, but it neither removes unneeded or unreachable code from dynamically linked libraries nor simplifies class hierarchies. Because of this, existing technologies do not significantly reduce software bloat nor improve execution efficiency. Tools are needed to reduce software bloat and collapse software layers and thus improve software efficiency and robustness without impacting developer productivity.

    http://www.navysbir.com/n13_1/N131-061.htm
    actually, all high-level programming has been only about developer productivity. for true HPC, we've had Just Asming & our brains, automation gets very close to zero in the rest of cases . if such dramatically-advanced tools will be realized, one will be buying programmers in wholesale.

  4. #19
    Join Date
    Jul 2013
    Posts
    576

    Re: IF-free conception.

    Quote Originally Posted by S@rK0Y View Post
    IT has no magic pills.
    That's right and that's why I'm not agreeing with you. You're claiming that application programmers should start applying hardware specific micro-optimizations throughout and that's more like a suicide pill.

    Of course if you're a systems programmer you should apply every trick in the book to make it run as fast as possible on the target system/hardware. I don't disagree with that. I'm not even saying that programmers should never micro-optimize. Certainly they should occasionally but not routinely.

    And of course there's room for improvements on systems software and tools like the article you referrenced suggests.

    "OBJECTIVE: Develop techniques, methods and tools for improving software execution efficiency and reducing software bloat while preserving the programming productivity gains offered by modern software engineering practices."

    The "modern software engineering practices" mentioned here are the total opposite of what you are suggesting. Your arguments mirror the opponents of high-level languages who claimed half a century ago that this is a fad and that to be efficient application programs must be written in assembly language.

    Furthermore, the article specifically points out the challenges posed by the multiple cores of todays CPUs and GPUs but nowhere does it say that "modern software engineering practices" should be abandoned like you suggest.
    Last edited by razzle; July 28th, 2014 at 12:27 AM.

  5. #20
    Join Date
    Jul 2005
    Location
    Netherlands
    Posts
    2,042

    Re: IF-free conception.

    Quote Originally Posted by S@rK0Y View Post
    please, switch off optimizations in compiler
    Are you benchmarking unoptimized code? That's not very useful.
    Cheers, D Drmmr

    Please put [code][/code] tags around your code to preserve indentation and make it more readable.

    As long as man ascribes to himself what is merely a posibility, he will not work for the attainment of it. - P. D. Ouspensky

  6. #21
    Join Date
    Feb 2013
    Posts
    58

    Re: IF-free conception.

    D Drmmr, i did run optimized code. however, there could be problems w/ smarty modes of compilers because algo has used rXX, xmmXX & mmX registers as global vars. nevertheless nothing prevents us to run tests in separate files (one for fsort(no-if) & second for others).

    That's right and that's why I'm not agreeing with you. You're claiming that application programmers should start applying hardware specific micro-optimizations throughout and that's more like a suicide pill.

    Of course if you're a systems programmer you should apply every trick in the book to make it run as fast as possible on the target system/hardware. I don't disagree with that. I'm not even saying that programmers should never micro-optimize. Certainly they should occasionally but not routinely.

    And of course there's room for improvements on systems software and tools like the article you referrenced suggests.
    Razzle, i have no suggested to micro-code throughout whole thing. each algo has critical parts which are suitable to gain the significant speed-up. For instance, if block in algo consumes only 0.1% of overall time -- the're zeroed reasons to spend any efforts in the rest of cases.

    Furthermore, the article specifically points out the challenges posed by the multiple cores of todays CPUs and GPUs but nowhere does it say that "modern software engineering practices" should be abandoned like you suggest.
    Author just has expressed to us his wishes/dreams/hopes/etc-etc. In fact, we've no had even something a little resembling towards those magic tools.

  7. #22
    Join Date
    Jul 2013
    Posts
    576

    Re: IF-free conception.

    Quote Originally Posted by S@rK0Y View Post
    i have no suggested to micro-code throughout whole thing.
    I don't know any longer what it is you're suggesting?

    But it's my firm belief that ordinary application programmers shouldn't dabble in the type of micro-optimizations you're suggesting. It's for experts writing systems software. If you feel you have come up with some improved "algo" it's the counterparts from nVidia and Intel you should be benchmarking against.

    Anyway I feel we're not getting anywhere so I drop it here. Thank you.

    Author just has expressed to us his wishes/dreams/hopes/etc-etc. In fact, we've no had even something a little resembling towards those magic tools.
    It's pretty unclear why you posted this reference, but to me it looks more like a concrete proposal rather than a vision.

    And that's very unfortunate. This proposal is a bureaucrat's wet dream. You take any piece of crappy software and apply a magic "tool" and out comes the best program ever written.

    Unfortunately it doesn't work like that. Everything that's needed to write good software is already available. All you have to do is employ highly skilled programmers. That's all magic you need also in the navy.
    Last edited by razzle; July 29th, 2014 at 04:58 AM.

  8. #23
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,626

    Re: IF-free conception.

    Rule 1:
    READABLE and MAINTAINABLE code is 100x more important than incomprehensible magic tricks.

    Rule 2:
    for 98% of the code you write performance is not relevant.
    for the remaining 2%, 99% of the time a 'reasonable' readable/maintainable implementation will be fast enough.

    Rule 3:
    there are exceptional cases where performance will matter. Though I've rarely seen sorting to be one of them. The few cases where it has been the issue, the solution was removing the need to sort entirely rather than "writing a better sort".

    Rule 4:
    A modern compiler is typically better at micro optimizations than you are. ANd the bonus is that the compiler will happily do that for ALL of your code, even the parts where it doesn't matter.

    Rule 5:
    Optimisation is hard. If you do end up needing to go this route, then I fully expect to get ORDER OF MAGNITUDE type improvements, not the kind of 'a few %' faster that microoptimisations will give you. Percentage improvements are drowned in the noise or in the "it doesn't make a difference" realm. If I have to wait 10minutes for a result, then I don't really care if several hours of optimising will turn that into 9minutes (which is 10%, which is significant for microoptimizing) if it means the code becomes unreadable/unmaintainable.
    If you can bring it to several seconds, now THEN we're talking.

    That kind of improvements don't come from microoptimising, they come from algorithmic changes, which is typically easier to do in a higher level language. Rarely do they come from the fact you can make use of a specific processor feature that a compiler for some reason can't exploit.


    ANd just in case you're wondering, I do optimizing for a living both in real live code as well as giving lectures on the various topics.


    ---
    Your example also makes little sense as a demonstration
    you're comparing qsort to 'fsort'
    but you give no explanation as to what kind of qsort or of this fsort.

    qsort is a good "general purpose" sorter, it is by no means the fastest for particular type of data it gives good performance for most cases. It does this by giving good averages as to the number of compares, the number of item swaps, memory usage and internal management.
    some datasets may suffer from compares or may suffer from item swaps. and a sort that is optimized to optimize for fewer compares of fewer swaps (or both)

    so as it stands, we have 2 different algorithms being compared and yours is better, congratulations, I can do the same and give even more impressive numbers and it'll break down entirely if you aren't sorting the same kind of data as I have.
    it means little.
    Comparing sort times of different algo's is not very useful precisely because they're very sensitive to data.


    it doesn't really proove anything at all about your "NO IF" concept at all. Now go write your fsort with if's, compile both with compiler optimisations turned on, and maybe, very maybe, then we can start an actual discussion.

    if is by no means the demon it used to be these days, on an old CPU a conditional branch could take a dozen or so clockcycles, so a 'trick' here and there might get you a bit of extra performance, but with modern cpu's a branch costs no overhead at all when correctly predicted, and only causes a 1 cycle penalty when not. And new cpu's on the horizon are already taking branch prediction to the level where both the true and false branch are being pre-executed, so a branch effectively takes 0 time, no amount of trickery is going to improve on that. on the countrary, with a premise like that, the overhead for the trick may very well work against you.

  9. #24
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,626

    Re: IF-free conception.

    Also... Both compiler designers as well as CPU designers will optimize their compilers/cpu's to achieve best performance for "common usage scenario's". so sticking with 'normal' code makes sense, in the long run, all sorts of CPU tricks are becoming less and less effective every generation of compiler/cpu.

  10. #25
    Join Date
    Feb 2013
    Posts
    58

    Re: IF-free conception.

    2 Razzle
    I don't know any longer what it is you're suggesting?
    actually, from the very start, i've suggested nothing. i just have shared the code to test it onto different intel-based machines.

    If you feel you have come up with some improved "algo" it's the counterparts from nVidia and Intel you should be benchmarking against.
    this solution is the cpu-hosted code, it's incorrect to smash it against gpu-hosted ones. gpu muscle makes its own. perhaps, later i'll write something for.
    Unfortunately it doesn't work like that. Everything that's needed to write good software is already available. All you have to do is employ highly skilled programmers. That's all magic you need also in the navy.
    Author of that article clearly explained situation: the World has no so many highly skilled persons.

    2 OReubens
    Rule 1:
    READABLE and MAINTAINABLE code is 100x more important than incomprehensible magic tricks.

    Rule 2:
    for 98% of the code you write performance is not relevant.
    for the remaining 2%, 99% of the time a 'reasonable' readable/maintainable implementation will be fast enough.

    Rule 3:
    there are exceptional cases where performance will matter. Though I've rarely seen sorting to be one of them. The few cases where it has been the issue, the solution was removing the need to sort entirely rather than "writing a better sort".

    Rule 4:
    A modern compiler is typically better at micro optimizations than you are. ANd the bonus is that the compiler will happily do that for ALL of your code, even the parts where it doesn't matter.

    Rule 5:
    Optimisation is hard. If you do end up needing to go this route, then I fully expect to get ORDER OF MAGNITUDE type improvements, not the kind of 'a few %' faster that microoptimisations will give you. Percentage improvements are drowned in the noise or in the "it doesn't make a difference" realm. If I have to wait 10minutes for a result, then I don't really care if several hours of optimising will turn that into 9minutes (which is 10%, which is significant for microoptimizing) if it means the code becomes unreadable/unmaintainable.
    If you can bring it to several seconds, now THEN we're talking.

    That kind of improvements don't come from microoptimising, they come from algorithmic changes, which is typically easier to do in a higher level language. Rarely do they come from the fact you can make use of a specific processor feature that a compiler for some reason can't exploit.
    your words are quite reasonable, if you're talking of just desktop realm. HPC is different Beast. you can scale the parallel computing up to insanity to run very slow code or make highly-optimized algo in relatively cheap hardware. in fact, fsort(no-if) has two abstract algos to significantly boost sorting (1st approach for reduction of conditional branches is described in my blog & 2nd is to check out whether subset is already ordered or not). Actually, in HPC, microcoding & algorithmic change is very twisted w/ each other. For instance, how'd you like to exploit 3OE feature w/ no microcoding???
    qsort is a good "general purpose" sorter, it is by no means the fastest for particular type of data it gives good performance for most cases. It does this by giving good averages as to the number of compares, the number of item swaps, memory usage and internal management.
    some datasets may suffer from compares or may suffer from item swaps. and a sort that is optimized to optimize for fewer compares of fewer swaps (or both)
    ----------------
    Comparing sort times of different algo's is not very useful precisely because they're very sensitive to data.
    Absolutely True i wrote sorting for floating-point numbers. 4 integers, there is possible to get more speed-up.

    it doesn't really proove anything at all about your "NO IF" concept at all. Now go write your fsort with if's, compile both with compiler optimisations turned on, and maybe, very maybe, then we can start an actual discussion.
    i smashed it against standard qsort: larger arrays gives the very juice of fsort(no-if). high-level version will be good too, but apparently slower than this one.
    Also... Both compiler designers as well as CPU designers will optimize their compilers/cpu's to achieve best performance for "common usage scenario's". so sticking with 'normal' code makes sense, in the long run, all sorts of CPU tricks are becoming less and less effective every generation of compiler/cpu.
    Masters of Asm (i mean not myself ) prove otherwise. frankly, fsort(no-if) proves the very muscle of Asming too. by the way, did you ever see the codecs (video/audio) written w/o Asm at all?????

  11. #26
    Join Date
    Jul 2013
    Posts
    576

    Re: IF-free conception.

    Quote Originally Posted by S@rK0Y View Post
    Author of that article clearly explained situation: the World has no so many highly skilled persons.
    This proposal doesn't consider the human resource aspect at all. It seeks a pure technocratic solution in the form of a magic tool. But all necessary tools to produce lean and mean software is already in place. You only have to add skilled programmers who know how to use them. There's no shortage, but you get what you pay for.

    For some reason people think that anybody can be a programmer. Especially if "modern software engineering practices" are applied. Then programming is as easy as flipping a burger. I can't say I feel sorry for these organizations who are now sitting on a mountain of bloated and inefficient code. You bought cheap (sheep) and now you pay dearly.

    There is no magic tool. Just look at Ada, another bureauratic dream. Instead employ well educated highly skilled programmers. It will cost but it will work (or float as they probably say in the navy).
    Last edited by razzle; July 30th, 2014 at 05:53 AM.

  12. #27
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,626

    Re: IF-free conception.

    Quote Originally Posted by S@rK0Y View Post
    For instance, how'd you like to exploit 3OE feature w/ no microcoding???
    precisely BECAUSE a compiler can do microoptimizing a lot better than most programmers out there.
    And giving a compiler "the raw code" will allow the compiler more leeway in optimizing than already trying to 'preset' the compiler into a specific type of thinking.

    and also BECAUSE doing this by hand and get better performance takes a lot of time by a very experienced ASM programmer and it will typically only yield percentages. Most of the time that's not going to be significant enough to warrant the time/expense.

    i smashed it against standard qsort: larger arrays gives the very juice of fsort(no-if). high-level version will be good too, but apparently slower than this one.
    yes, but this is why it makes little sense to compare. Or rather, it makes a case for the sort algorithm itself 'qsort' vs 'fsort'. Rather than a "if based" vs "no if" concept.
    If you want fast sorting of floats, then a radix type sort (splitting it over exponent then mantissa) based solution is going to beat the pants off both qsort and fsort. (at a cost of more memory).

    Masters of Asm (i mean not myself ) prove otherwise. frankly, fsort(no-if) proves the very muscle of Asming too. by the way, did you ever see the codecs (video/audio) written w/o Asm at all?????
    I never claimed there is no space for hand crafted assembly code. There very much is. But there aren't a lot of people out there with the skills to make it really shine. Personally, if I can't at least imagine I'm going to get a 10x (that's 10 times, not 10%) improvement, I typically won't bother because it's a painful, slow process, and makes maintaining the code an even bigger pain. ANd quite often you'll need to make more than one version of the code to cater to changes in processor families.

    ANd no, I'm also not claiming there is no place for 'percentage' improvements either. For REALLY REALLY low level code like encoders/decoders, and graphics primitives, matrix primitives, a few % can make a big change because they'll end up being called hundreds of thousands of times in the lifetime of your code.

    In the 25 or so years I've been at this now (so yes, I grew up with the really low performance CPU's where hand tuning that last % did matter) I've never seen a sort being the crucial slowdown, the only case I can remember was a polygon sorter (now largely superceded because most 3d is done with Z-buffer approach), where a qsort just wasn't ideal because the dataset was typically too small to make qsort benefit and it was usually already partially sorted because of how the 3D engine worked.

  13. #28
    Join Date
    Feb 2013
    Posts
    58

    Re: IF-free conception.

    2 OReubens

    precisely BECAUSE a compiler can do microoptimizing a lot better than most programmers out there.
    And giving a compiler "the raw code" will allow the compiler more leeway in optimizing than already trying to 'preset' the compiler into a specific type of thinking.

    and also BECAUSE doing this by hand and get better performance takes a lot of time by a (#1)very experienced ASM programmer and it will typically only yield percentages. Most of the time that's (#2)not going to be significant enough to warrant the time/expense.
    Ye're Absolutely Right about #1 & i Absolutely don't share your #2: at large scale, highly-optimized code takes times lower ticks (at least) to accomplish the same task. Big Data bulges all bottlenecks in given solutions.
    yes, but this is why it makes little sense to compare. Or rather, it makes a case for the sort algorithm itself 'qsort' vs 'fsort'. Rather than a "if based" vs "no if" concept.
    well, for instance, purely if-based solution gonna take 8 "ifs" (even 11 in the worst realization) per loop, but if-reduction strategy makes possible to deal w/ only 3 "ifs" as well. not so bad, ain't it?
    If you want fast sorting of floats, then a radix type sort (splitting it over exponent then mantissa) based solution is going to beat the pants off both qsort and fsort. (at a cost of more memory).
    yes, it's well-known fact that memory-gobbling approach could easily skyrocket the performance, but (for BD), you quickly shall run out of memory & speed-up shall be ruined completely.
    never claimed there is no space for hand crafted assembly code. There very much is. But there aren't a lot of people out there with the skills to make it really shine. Personally, if I can't at least imagine I'm going to get a 10x (that's 10 times, not 10%) improvement, I typically won't bother because it's a painful, slow process, and makes maintaining the code an even bigger pain. ANd quite often you'll need to make more than one version of the code to cater to changes in processor families.
    Perfect True but the're no way around: HPC is hella cruel Hell to burn brains of developers awfully down to the ashes

  14. #29
    Join Date
    Feb 2013
    Posts
    58

    Re: IF-free conception.

    Quote Originally Posted by razzle View Post
    This proposal doesn't consider the human resource aspect at all. It seeks a pure technocratic solution in the form of a magic tool. But all necessary tools to produce lean and mean software is already in place. You only have to add skilled programmers who know how to use them. There's no shortage, but you get what you pay for.

    For some reason people think that anybody can be a programmer. Especially if "modern software engineering practices" are applied. Then programming is as easy as flipping a burger. I can't say I feel sorry for these organizations who are now sitting on a mountain of bloated and inefficient code. You bought cheap (sheep) and now you pay dearly.

    There is no magic tool. Just look at Ada, another bureauratic dream. Instead employ well educated highly skilled programmers. It will cost but it will work (or float as they probably say in the navy).
    Razzle, in fact, true HPC was destroyed by GHz in CPUs. who cares about optimization, if new "stone" will give more GHz muscle??? some years ago, many "experts" dreamed of 10 Ghz "stone". But such way got run out & now needs more sophisticated techniques to accelerate. 2nd moment, the rest of software for HPC has been pouring from COTS w/ some kind of "tuning".

  15. #30
    Join Date
    Jul 2013
    Posts
    576

    Re: IF-free conception.

    Quote Originally Posted by S@rK0Y View Post
    in fact, true HPC was destroyed by GHz in CPUs. who cares about optimization, if new "stone" will give more GHz muscle??? some years ago, many "experts" dreamed of 10 Ghz "stone". But such way got run out & now needs more sophisticated techniques to accelerate. 2nd moment, the rest of software for HPC has been pouring from COTS w/ some kind of "tuning".
    Sure, performance no longer comes from an ever increasing clock rate but from an increasing number of cores. And that puts concurrent programming at the centre stage.

    But the solution is not a holy grail or a fifth element (as the proposal you referred to suggests). The solution is much more mundate than that, namely skilled programmers.

    If the navy has money to spare they should put it into Ada, a "magic tool" style project the military already started. Most of the existing navy software should in Ada already, at least if they live by their own rules. Any advances should be made public and freely available for the benefit of everyone.

    The Second Coming of Ada would be a much better project, and I think the language is worth it.
    Last edited by razzle; July 31st, 2014 at 03:25 AM.

Page 2 of 4 FirstFirst 1234 LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured