Quote Originally Posted by GeoRanger

...but data in a digital computer is just a bunch of 0s and 1s. Those 0s and 1s could be an integer, text, music, or a picture. Consider the probably not portable code sample below...
Did you ever heard of non-POD types?


If the data you are swapping comes from memory, the compiler will have to fetch it and put it somewhere where the processor can operate on it, regardless of how you actually perform the swap. If the compiler decides to put it in registers, the actual swap could in principal be as simple as the C code implies consder...

Code:
    __asm
    {
        mov eax, 5
        mov ebx, -984
        xor eax, ebx
        xor ebx, eax
        xor eax, ebx
    }
Which is excessively slow : 3 CPU cycles! Absolutely not parallelizable!

While, with an intermediate temporary, a good compiler (I tested with GCC 3.2.3) may even produces no operation, and simply "rename" the variables as far as possible!
Or, a bad compiler, will simply generate:
Code:
mov edx,eax
mov eax,ebx ; 1 cycle (k6-2 processor)
mov ebx,edx ; 0.5 cycle (I mean that it can be paired with the next instruction)
But, a good compiler will often find "tricks" to optimize!

For example, with GCC 3.2.3:
Code:
#include <stdio.h>

void Swap(int& i,int& j)
{
int k=i;
i=j;
j=i;
}
int input();
int main()
{
int i,j;
i=input(); j=input();
if (input()) Swap(i,j);
printf("%d %d", i, j);
return 0;
}
Code:
#include <stdio.h>
int input()
{
int w;
scanf("%d",&w);
return w;
}
Produces:
Code:
0x401307 <main+23>:     call   0x401370 <_Z5inputv>
0x40130c <main+28>:     mov    esi,eax
0x40130e <main+30>:     call   0x401370 <_Z5inputv>
0x401313 <main+35>:     mov    ebx,eax
0x401315 <main+37>:     call   0x401370 <_Z5inputv>
0x40131a <main+42>:     test   eax,eax
0x40131c <main+44>:     je     0x401320 <main+48>
0x40131e <main+46>:     mov    esi,ebx
0x401320 <main+48>:     push   eax
0x401321 <main+49>:     push   ebx
0x401322 <main+50>:     push   esi
0x401323 <main+51>:     push   0x4012e1
0x401328 <main+56>:     call   0x401870 <printf>
0x40132d <main+61>:     lea    esp,[ebp-8]
0x401330 <main+64>:     pop    ebx
0x401331 <main+65>:     mov    eax,0x0
0x401336 <main+70>:     pop    esi
0x401337 <main+71>:     leave
0x401338 <main+72>:     ret
In fact, the swap operation itself has a cost of 1 CPU cycle (on a k6-2 processor), in the worst case, and none if input() returns zero!

Seriously, variable assignment are perfectly understood by the compier in the normal control flow of a program!

With that swap:
Code:
void Swap(int& i,int& j)
{
i^=j;
j^=i;
i^=j;
}
The compiler has been smart enough and understood that it has been written by a .... "optimizing" programmer...
Code:
0x401307 <main+23>:     call   0x401370 <_Z5inputv>
0x40130c <main+28>:     mov    esi,eax
0x40130e <main+30>:     call   0x401370 <_Z5inputv>
0x401313 <main+35>:     mov    ebx,eax
0x401315 <main+37>:     call   0x401370 <_Z5inputv>
0x40131a <main+42>:     test   eax,eax
0x40131c <main+44>:     je     0x401320 <main+48>
0x40131e <main+46>:     xchg   esi,ebx
0x401320 <main+48>:     push   eax
0x401321 <main+49>:     push   ebx
0x401322 <main+50>:     push   esi
0x401323 <main+51>:     push   0x4012e7
0x401328 <main+56>:     call   0x401870 <printf>
0x40132d <main+61>:     lea    esp,[ebp-8]
0x401330 <main+64>:     pop    ebx
0x401331 <main+65>:     mov    eax,0x0
0x401336 <main+70>:     pop    esi
0x401337 <main+71>:     leave
0x401338 <main+72>:     ret
But, on a less good optimizer (Borland C++ 5.0), the normal swap operation is translated to:
Code:
:00401169 8BC3           mov    eax,ebx
:0040116B 8BDE           mov    ebx,esi ; 1
:0040116D 8BF0           mov    esi,eax

:0040116F 56             push   esi ;2 (k6-2 has special optimization for mov operations, known as register renaming).
:00401170 53             push   ebx ; 3
:00401171 6834D14000     push   0040D134 ;4
:00401176 E84D2C0000     call   printf
While, the xor stuff, generate this code:
Code:
:00401169 33DE           xor    ebx,esi ;1
:0040116B 33F3           xor    esi,ebx ; 2
:0040116D 33DE           xor    ebx,esi
:0040116F 56             push   esi ;3
:00401170 53             push   ebx ;4
:00401171 6834D14000     push   0040D134 ; 5
:00401176 E84D2C0000     call   printf
In short, the swap operation used 2.5 cycles instead of 1.5 !

Quote Originally Posted by GeoRanger

On a different note, isn't it best to avoid using references in a function like this? Wouldn't "void swap(int * a, int * b)" make clearer at the point where this function is invoked that a and b might be altered. I try to limit my use of references to places where they are really needed, such as when overriding some operator or in a copy constructor.
Do you come from a C background?

I used the cannonical syntax of the standard library (std::swap), all C++ programmers can understand that perfectly, since they are used to std::swap!

In fact, std::swap is the best candidate for optimization, since the compiler is free to create optimizations specifically for that function.

Quote Originally Posted by GeoRanger
What is totally unobvious is that enqueue internally generates a pointer to the reference and stores it. pNode gets deleted later when it is dequeued. But all of this would be much more obvious if enqueue accepted the non-const pointer instead of a reference.
Here, a reference is not natural, because it is a "pointer sink". But for std::swap, it is natural (at least from the point-of-view of a C++ programmer). std::swap does not delete the data!
In fact, since it is a pointer sink, a std::auto_ptr would be very natural, and implicitly documents perfectly that the function "gets ownership" of the data.

There is a thread about references vs pointers.
http://www.codeguru.com/forum/showth...=383452&page=1