CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 3 of 3
  1. #1
    Join Date
    Apr 2009
    Posts
    1

    Resolved Inaccuracy of QueryPerformanceCounter in "small" functions

    Hi,

    I need to check the performance of a code snippet, and for this reason, I need to measure the execution time or cycles.

    I learned that the function QueryPerformanceCounter retrieves the current value of the high-resolution performance counter. I used it before executing a loop, and afterward. Subtracting the latter value from the former should give me the time (or cycles) spent to execute the loop.

    The results were quiet surprising. Below you can find the code followed by a typical output.

    When I'm executing small loops, such as 10 or 100 times, the execution time of a loop and another loop that is only 10 times bigger, are not proportional at all!

    However when the loop increases to 10,000 or 100,000, the execution time begins to appear proportional and increases by 10, just as the amount the loop size increases.

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <math.h>

    #include <time.h>
    #include <windows.h>
    #include <winbase.h>


    int main()
    {

    LARGE_INTEGER ticksPerSecond;
    LARGE_INTEGER start_ticks, end_ticks, cputime;
    double delay_xor;
    int j;
    long l1,l2,l3;

    l1=0xABCDEF12; //random 32 bits values in hexadecimals
    l2=0x12345678;
    l3=0xAB12CD34;

    j = sizeof(double); //8 bytes
    j = sizeof(long); //4 bytes
    j = sizeof(LARGE_INTEGER); //8 bytes


    QueryPerformanceCounter(&start_ticks);
    for(j=0;j<1;j++) //loop once
    l3 = l1^l2+j;
    QueryPerformanceCounter(&end_ticks);
    cputime.QuadPart = end_ticks.QuadPart- start_ticks.QuadPart;
    delay_xor = (double)cputime.QuadPart;
    printf("\n First Loop: %.2f",delay_xor);

    QueryPerformanceCounter(&start_ticks);
    for(j=0;j<10;j++) //loop 10 times
    l3 = l1^l2+j;
    QueryPerformanceCounter(&end_ticks);
    cputime.QuadPart = end_ticks.QuadPart- start_ticks.QuadPart;
    delay_xor = (double)cputime.QuadPart;
    printf("\nSecond Loop: %.2f",delay_xor);

    QueryPerformanceCounter(&start_ticks);
    for(j=0;j<100;j++) //loop 100 times
    l3 = l1^l2+j;
    QueryPerformanceCounter(&end_ticks);
    cputime.QuadPart = end_ticks.QuadPart- start_ticks.QuadPart;
    delay_xor = (double)cputime.QuadPart;
    printf("\n Third Loop: %.2f",delay_xor);

    QueryPerformanceCounter(&start_ticks);
    for(j=0;j<1000;j++) //loop 1000 times
    l3 = l1^l2+j;
    QueryPerformanceCounter(&end_ticks);
    cputime.QuadPart = end_ticks.QuadPart- start_ticks.QuadPart;
    delay_xor = (double)cputime.QuadPart;
    printf("\nFourth Loop: %.2f\n",delay_xor);

    QueryPerformanceCounter(&start_ticks);
    for(j=0;j<10000;j++) //loop 10000 times
    l3 = l1^l2+j;
    QueryPerformanceCounter(&end_ticks);
    cputime.QuadPart = end_ticks.QuadPart- start_ticks.QuadPart;
    delay_xor = (double)cputime.QuadPart;
    printf("\n Fifth Loop: %.2f",delay_xor);

    QueryPerformanceCounter(&start_ticks);
    for(j=0;j<100000;j++) //loop 100000 times
    l3 = l1^l2+j;
    QueryPerformanceCounter(&end_ticks);
    cputime.QuadPart = end_ticks.QuadPart- start_ticks.QuadPart;
    delay_xor = (double)cputime.QuadPart;
    printf("\n Sixth Loop: %.2f",delay_xor);

    QueryPerformanceCounter(&start_ticks);
    for(j=0;j<1000000;j++) //loop 100000 times
    l3 = l1^l2+j;
    QueryPerformanceCounter(&end_ticks);
    cputime.QuadPart = end_ticks.QuadPart- start_ticks.QuadPart;
    delay_xor = (double)cputime.QuadPart;
    printf("\nSeventh Loop: %.2f\n\n",delay_xor);


    return 0;
    Please note the output:

    First Loop: 18.00
    Second Loop: 21.00
    Third Loop: 23.00
    Fourth Loop: 62.00
    Fifth Loop: 452.00
    Sixth Loop: 4331.00
    Seventh Loop: 44476.00

    Press any key to continue . . .
    Is there any explanation of the non-linearity of the outcome in small loops?

    Is there any other accurate alternative to measure the execution time of small loops?

    Thanks in advance.
    Last edited by Elias.pp; April 29th, 2009 at 07:30 AM. Reason: Typo

  2. #2
    VictorN's Avatar
    VictorN is offline Super Moderator Power Poster
    Join Date
    Jan 2003
    Location
    Hanover Germany
    Posts
    20,430

    Re: Inaccuracy of QueryPerformanceCounter in quickly executing snippets

    Did you ever thought that this your code:
    Code:
    QueryPerformanceCounter(&end_ticks);
    cputime.QuadPart = end_ticks.QuadPart- start_ticks.QuadPart;
    delay_xor = (double)cputime.QuadPart;
    needs a liitle time for the execution too?
    And assuming that your original code for tests:
    Code:
    long l1,l2,l3;
    ....
    for(j=0;j<1;j++) //loop once
    l3 = l1^l2+j;
    is very simple and fast, the *additional* time for the estimation is relative high!
    Victor Nijegorodov

  3. #3
    Join Date
    Jan 2001
    Posts
    253

    Re: Inaccuracy of QueryPerformanceCounter in quickly executing snippets

    You should take a look at the generated assembler code by the compiler.

    When I built your code using Visual Studio 2008 (in release mode), it completely optimized away your loops. Since the result of the loop wasn't used, the compiler determined that it didn't even need the loop and completely removed it.

    I modified the printf statements to also print out the l3 variable so that the result was needed. After doing this, the compiler still removed the first 2 loops (it computed the value for l3 as part of the optimization). The rest of the loops (on my system) generated code.

    Depending on your compiler and on the optimization level, you may see different results.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured