2 Razzle
Quote:
I don't know any longer what it is you're suggesting?
actually, from the very start, i've suggested nothing. i just have shared the code to test it onto different intel-based machines.
Quote:
If you feel you have come up with some improved "algo" it's the counterparts from nVidia and Intel you should be benchmarking against.
this solution is the cpu-hosted code, it's incorrect to smash it against gpu-hosted ones. gpu muscle makes its own. perhaps, later i'll write something for.
Quote:
Unfortunately it doesn't work like that. Everything that's needed to write good software is already available. All you have to do is employ highly skilled programmers. That's all magic you need also in the navy.
Author of that article clearly explained situation: the World has no so many highly skilled persons.
2 OReubens
Quote:
Rule 1:
READABLE and MAINTAINABLE code is 100x more important than incomprehensible magic tricks.
Rule 2:
for 98% of the code you write performance is not relevant.
for the remaining 2%, 99% of the time a 'reasonable' readable/maintainable implementation will be fast enough.
Rule 3:
there are exceptional cases where performance will matter. Though I've rarely seen sorting to be one of them. The few cases where it has been the issue, the solution was removing the need to sort entirely rather than "writing a better sort".
Rule 4:
A modern compiler is typically better at micro optimizations than you are. ANd the bonus is that the compiler will happily do that for ALL of your code, even the parts where it doesn't matter.
Rule 5:
Optimisation is hard. If you do end up needing to go this route, then I fully expect to get ORDER OF MAGNITUDE type improvements, not the kind of 'a few %' faster that microoptimisations will give you. Percentage improvements are drowned in the noise or in the "it doesn't make a difference" realm. If I have to wait 10minutes for a result, then I don't really care if several hours of optimising will turn that into 9minutes (which is 10%, which is significant for microoptimizing) if it means the code becomes unreadable/unmaintainable.
If you can bring it to several seconds, now THEN we're talking.
That kind of improvements don't come from microoptimising, they come from algorithmic changes, which is typically easier to do in a higher level language. Rarely do they come from the fact you can make use of a specific processor feature that a compiler for some reason can't exploit.
your words are quite reasonable, if you're talking of just desktop realm. HPC is different Beast. you can scale the parallel computing up to insanity to run very slow code or make highly-optimized algo in relatively cheap hardware. in fact, fsort(no-if) has two abstract algos to significantly boost sorting (1st approach for reduction of conditional branches is described in my blog & 2nd is to check out whether subset is already ordered or not). Actually, in HPC, microcoding & algorithmic change is very twisted w/ each other. For instance, how'd you like to exploit 3OE feature w/ no microcoding???
Quote:
qsort is a good "general purpose" sorter, it is by no means the fastest for particular type of data it gives good performance for most cases. It does this by giving good averages as to the number of compares, the number of item swaps, memory usage and internal management.
some datasets may suffer from compares or may suffer from item swaps. and a sort that is optimized to optimize for fewer compares of fewer swaps (or both)
----------------
Comparing sort times of different algo's is not very useful precisely because they're very sensitive to data.
Absolutely True :) i wrote sorting for floating-point numbers. 4 integers, there is possible to get more speed-up.
Quote:
it doesn't really proove anything at all about your "NO IF" concept at all. Now go write your fsort with if's, compile both with compiler optimisations turned on, and maybe, very maybe, then we can start an actual discussion.
i smashed it against standard qsort: larger arrays gives the very juice of fsort(no-if). high-level version will be good too, but apparently slower than this one.
Quote:
Also... Both compiler designers as well as CPU designers will optimize their compilers/cpu's to achieve best performance for "common usage scenario's". so sticking with 'normal' code makes sense, in the long run, all sorts of CPU tricks are becoming less and less effective every generation of compiler/cpu.
Masters of Asm (i mean not myself ;) ) prove otherwise. frankly, fsort(no-if) proves the very muscle of Asming too. by the way, did you ever see the codecs (video/audio) written w/o Asm at all?????