|
-
April 29th, 2009, 07:21 AM
#1
Inaccuracy of QueryPerformanceCounter in "small" functions
Hi,
I need to check the performance of a code snippet, and for this reason, I need to measure the execution time or cycles.
I learned that the function QueryPerformanceCounter retrieves the current value of the high-resolution performance counter. I used it before executing a loop, and afterward. Subtracting the latter value from the former should give me the time (or cycles) spent to execute the loop.
The results were quiet surprising. Below you can find the code followed by a typical output.
When I'm executing small loops, such as 10 or 100 times, the execution time of a loop and another loop that is only 10 times bigger, are not proportional at all!
However when the loop increases to 10,000 or 100,000, the execution time begins to appear proportional and increases by 10, just as the amount the loop size increases.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <time.h>
#include <windows.h>
#include <winbase.h>
int main()
{
LARGE_INTEGER ticksPerSecond;
LARGE_INTEGER start_ticks, end_ticks, cputime;
double delay_xor;
int j;
long l1,l2,l3;
l1=0xABCDEF12; //random 32 bits values in hexadecimals
l2=0x12345678;
l3=0xAB12CD34;
j = sizeof(double); //8 bytes
j = sizeof(long); //4 bytes
j = sizeof(LARGE_INTEGER); //8 bytes
QueryPerformanceCounter(&start_ticks);
for(j=0;j<1;j++) //loop once
l3 = l1^l2+j; QueryPerformanceCounter(&end_ticks);
cputime.QuadPart = end_ticks.QuadPart- start_ticks.QuadPart;
delay_xor = (double)cputime.QuadPart;
printf("\n First Loop: %.2f",delay_xor);
QueryPerformanceCounter(&start_ticks);
for(j=0;j<10;j++) //loop 10 times
l3 = l1^l2+j; QueryPerformanceCounter(&end_ticks);
cputime.QuadPart = end_ticks.QuadPart- start_ticks.QuadPart;
delay_xor = (double)cputime.QuadPart;
printf("\nSecond Loop: %.2f",delay_xor);
QueryPerformanceCounter(&start_ticks);
for(j=0;j<100;j++) //loop 100 times
l3 = l1^l2+j; QueryPerformanceCounter(&end_ticks);
cputime.QuadPart = end_ticks.QuadPart- start_ticks.QuadPart;
delay_xor = (double)cputime.QuadPart;
printf("\n Third Loop: %.2f",delay_xor);
QueryPerformanceCounter(&start_ticks);
for(j=0;j<1000;j++) //loop 1000 times
l3 = l1^l2+j; QueryPerformanceCounter(&end_ticks);
cputime.QuadPart = end_ticks.QuadPart- start_ticks.QuadPart;
delay_xor = (double)cputime.QuadPart;
printf("\nFourth Loop: %.2f\n",delay_xor);
QueryPerformanceCounter(&start_ticks);
for(j=0;j<10000;j++) //loop 10000 times
l3 = l1^l2+j; QueryPerformanceCounter(&end_ticks);
cputime.QuadPart = end_ticks.QuadPart- start_ticks.QuadPart;
delay_xor = (double)cputime.QuadPart;
printf("\n Fifth Loop: %.2f",delay_xor);
QueryPerformanceCounter(&start_ticks);
for(j=0;j<100000;j++) //loop 100000 times
l3 = l1^l2+j; QueryPerformanceCounter(&end_ticks);
cputime.QuadPart = end_ticks.QuadPart- start_ticks.QuadPart;
delay_xor = (double)cputime.QuadPart;
printf("\n Sixth Loop: %.2f",delay_xor);
QueryPerformanceCounter(&start_ticks);
for(j=0;j<1000000;j++) //loop 100000 times
l3 = l1^l2+j; QueryPerformanceCounter(&end_ticks);
cputime.QuadPart = end_ticks.QuadPart- start_ticks.QuadPart;
delay_xor = (double)cputime.QuadPart;
printf("\nSeventh Loop: %.2f\n\n",delay_xor);
return 0;
Please note the output:
First Loop: 18.00
Second Loop: 21.00
Third Loop: 23.00
Fourth Loop: 62.00
Fifth Loop: 452.00
Sixth Loop: 4331.00
Seventh Loop: 44476.00
Press any key to continue . . .
Is there any explanation of the non-linearity of the outcome in small loops?
Is there any other accurate alternative to measure the execution time of small loops?
Thanks in advance.
Last edited by Elias.pp; April 29th, 2009 at 07:30 AM.
Reason: Typo
-
April 29th, 2009, 07:44 AM
#2
Re: Inaccuracy of QueryPerformanceCounter in quickly executing snippets
Did you ever thought that this your code:
Code:
QueryPerformanceCounter(&end_ticks);
cputime.QuadPart = end_ticks.QuadPart- start_ticks.QuadPart;
delay_xor = (double)cputime.QuadPart;
needs a liitle time for the execution too?
And assuming that your original code for tests:
Code:
long l1,l2,l3;
....
for(j=0;j<1;j++) //loop once
l3 = l1^l2+j;
is very simple and fast, the *additional* time for the estimation is relative high!
Victor Nijegorodov
-
April 29th, 2009, 09:56 AM
#3
Re: Inaccuracy of QueryPerformanceCounter in quickly executing snippets
You should take a look at the generated assembler code by the compiler.
When I built your code using Visual Studio 2008 (in release mode), it completely optimized away your loops. Since the result of the loop wasn't used, the compiler determined that it didn't even need the loop and completely removed it.
I modified the printf statements to also print out the l3 variable so that the result was needed. After doing this, the compiler still removed the first 2 loops (it computed the value for l3 as part of the optimization). The rest of the loops (on my system) generated code.
Depending on your compiler and on the optimization level, you may see different results.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|