Long time ago I heard that the speed of floating point multiplication is much faster than division. Is it still the case today? (On a not very powerful netbook) Thanks!
Printable View
Long time ago I heard that the speed of floating point multiplication is much faster than division. Is it still the case today? (On a not very powerful netbook) Thanks!
Yes but there are so many other things to consider as well when predicting the execution speed. Here's some reading http://www.agner.org/optimize/instruction_tables.pdf
What I learned when I was doing CPU architecture research is that not only is division costly (about 40 cycles), vs a multiplication (about 10 cycles), divisions are not "pipe-able".
What this means is that once you order your processor to do the division, you'll have to wait until it is finished before starting a new division. Multiplications, on the other hand, you can request a new multiplication on every cycle. You'll still have to wait the full 10 cycles for each to finish. So for example, if you need to do 10 multiplications, you can spend the first 10 cycles requesting the multiplications, and the during the next ten cyces, you'll get 1 result per cycle.
All of this is very theoretical of course. The rule of thumb is "division is kind of expensive", but quite frankly, as always, these kinds of little optizations are the last thingg you should worry about. I've NEVER seen an algorithm where it really made a change.
If you must divide, then you must divide...
Thank you both! How about integer multiplication vs. shift? For example <<2 vs. *4 ? Thanks!
You know, if it really matters, measure.
A modern compiler will replace (if it's possible) a multiplication/division by a 2 base with shifts. An example from a MSVC2005 debug(!) buildCode:unsigned int a = 0x12345678;
004113BE mov dword ptr [a],12345678h
a /= 2;
004113C5 mov eax,dword ptr [a]
004113C8 shr eax,1
004113CA mov dword ptr [a],eax
a *= 2;
004113CD mov eax,dword ptr [a]
004113D0 shl eax,1
004113D2 mov dword ptr [a],eax
Even a non-modern compiler will do this.
Back in the world of 16-bit DOS systems (Visual C++ 1.x, Turbo C) , there was no guarantee that the machine had a floating point processor installed, so why introduce emulated FP when not necessary.
I know that for Turbo C, Borland made sure that the underlying code did not use any emulated floating point unless it was necessary (and sometimes you had to force the linker to use the floating point emulation).
Regards,
Paul McKenzie
Yes I remember that as well but for integer operations I doubt that the co-processor was in question. ;)
The reason I wrote modern was merely that I neither remember nor can test how such an old compiler behaves.
Edit: I just remembered that some years ago I cleaned up some old 16 bit stuff and before throwing it nostalgy made me make a virtual Win3.11 machine. Here's a screen dump :)
Thanks again everyone! :)
Mainly added for completeness: I think I can assure that MS C 6.0 (most probably early 90s, running under DOS) already did that "prefer SHL/SAR/SHR over MUL/IMUL/DIV/IDIV when possible" optimization. Probably even some earlier compilers did that as well, but I'm not definitely sure about that. IIRC they even did replace multiplications by constants with few one-bits with SHL/ADD combinations.
BTW, haven't seen anything like your Win 3.11 screen shot for a really long time... :D
And we shall all be grateful for what has replaced it. Even though I remember that back in those days we all were very exited. The change from running codeview in a DOS screen to running the debugger in a proper window was great. Today that old environment felt very very akward to handle though... :)
shift is ALWAYS faster. However, you shouldn't need to worry about that. The compiler is smart enough to know that and fix it for you. In fact, shift is so much faster that to multiply by six (which is not shiftable)
is faster thanCode:x = (y << 2) + (y << 1);
But again, your compiler should do that for you. And I'm not sure this is still the case, I know it was the case on 486 and before, but processors have changed a lot. No matter how much processors change though, a bit shift will always be faster than a multiplication, its probably actually the fastest instruction their is. Of course, it's only useful for integers, so it doesn't affect the original query.Code:x = y * 6;
Like everyone else said though, the pipeline and the cache have a much bigger impact.
Does the speed depend on how many bits are shifted? For example would >>16 take longer than >>1 ? Just curious. Thanks!
Not on any architecture I now of.
It may depend on the size of the object being shifted though, in particular, for when the long long type that doesn't fit inside a register (eg, it is emulated). Though in this case, to be fair, the instruction doesn't actually exist, the assembly code is hand written by the compiler writers.