C++ Profiling: How do I determine the speed of a particular function or operation?
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 4 of 4

Thread: C++ Profiling: How do I determine the speed of a particular function or operation?

Hybrid View

  1. #1
    Join Date
    Nov 2002
    Location
    Foggy California
    Posts
    1,245

    C++ Profiling: How do I determine the speed of a particular function or operation?

    Q: How do I determine the speed of a particular function or operation?

    A: Determining the speed of a particular operation is often known as profiling. The term "profiling" can also be used when other information about an operation's profile is queried -- such as the number of calls to a function. But for the purpose of this FAQ, we'll focus on determing the speed of a particular operation.


    Part I: How to Profile

    The best solution for profiling an operation is to use a profiler. Visual Studio has a good profiler. Linux has gprof (http://linuxgazette.net/100/vinayak.html). If your compiler doesn't have a profiler, it might be worthwhile purchasing a compiler that does if you will often be profiling your code.

    If you have to get by without using a professional profiler, then you can usually get by embedding your own into your program. It may not always be as accurate or easy to use as a professional profiler, but then again, you get what you pay for.

    To profile an operation, you will need a high-precision timer. Pentium-compatible processors have an RDTSC (ReaD Time Stamp Counter) assembly instruction that will report the current processor tick. Windows operating systems usually support the QueryPerformanceCounter() WinAPI call. Some computers, usually laptops and handhelds, vary their clock frequency to balance CPU usage and battery life. For these computers, RDTSC and QueryPerformanceCounter() will return different information. RDTSC will return the number of CPU cycles it took to complete an operation whereas the QueryPerformanceCounter() will return the total time it took to complete the operation. It is up to you to determine which is the more important piece of information for you.

    On multi-threading operating systems, one must be careful when profiling that other threads do not interrupt the timing of your operation. Interrupted tests will produce misleading results. For many operating systems, the best way to tackle this is to put your thread to sleep right before your test starts. That way your test will have a full thread quantum to execute the test.

    Below you will find some code useful for writing your own profiler. The examples are independent of each other - each is a seperate one.


    Part II: Good Profiling Habits

    Now that you know how to profile an operation, you should also practice good profiling habits. For example, if you are testing how fast a database query takes, one individual query may take a long time while another does not. Even then if the queries were identical the speed of the query will depend on the load of the database server. For this reason, it is good practice to:
    • Profile your operation with different parameters. For the database query example, you should use multiple different queries for a given test, that way you gain knowledge to the overall performance.
    • Profile your operation with unique parameters. For the database query example, you may find one particular query that takes a really long time. If this is the case, you can investigate to find out why.
    • Profile under controlled conditions. Often this means running your tests on minimally burdened systems. For the database query example, you will want to make sure that the your application is the only one connected to the database server and that as few other processes are running as possible. This way, your results will more closely represent true performance -- your profiling results will also be more reproducible.
    • Profile multiple times -- not just once. How reproducible your results are will often tell you a lot about your operation and/or test.


    Credits

    Thanks Mick and Yves for your help with this FAQ.


    Attached Files Attached Files
    Last edited by Andreas Masur; July 24th, 2005 at 05:37 AM.

  2. #2
    Join Date
    May 2000
    Location
    KY, USA
    Posts
    18,652
    Code:
    #ifndef _PRECISIONTIMER_H_
    #define _PRECISIONTIMER_H_
    
    #include <windows.h>
    
    class CPrecisionTimer
    {
      LARGE_INTEGER lFreq, lStart;
    
    public:
      CPrecisionTimer()
      {
        QueryPerformanceFrequency(&lFreq);
      }
    
      inline void Start()
      {
        QueryPerformanceCounter(&lStart);
      }
      
      inline double Stop()
      {
        // Return duration in seconds...
        LARGE_INTEGER lEnd;
        QueryPerformanceCounter(&lEnd);
        return (double(lEnd.QuadPart - lStart.QuadPart) / lFreq.QuadPart);
      }
    };
    
    #endif // _PRECISIONTIMER_H_

    Last edited by Andreas Masur; July 24th, 2005 at 05:38 AM.

  3. #3
    Join Date
    May 2000
    Location
    KY, USA
    Posts
    18,652
    Code:
    #ifndef _RDTSCTIMER_H_
    #define _RDTSCTIMER_H_
    
    class CRdtscTimer
    {
      unsigned __int64 start, end;
    
      inline static unsigned __int64 _RDTSC()
      {
        _asm    _emit 0x0F
        _asm    _emit 0x31
      }
    
    public:
      inline void Start()
      {
        start = _RDTSC();
      }
      
      inline unsigned __int64 Stop()
      {
        end = _RDTSC();
        return (end-start);
      }
    };
    
    #endif // _RDTSCTIMER_H_

    Last edited by Andreas Masur; July 24th, 2005 at 05:38 AM.

  4. #4
    Join Date
    May 2000
    Location
    KY, USA
    Posts
    18,652
    Code:
    #define WIN32_LEAN_AND_MEAN
    
    #include <windows.h>
    #include <iostream>
    
    using namespace std;
    
    inline unsigned __int64 RDTSC(void)
    {
      _asm  _emit 0x0F
      _asm  _emit 0x31
    }
    
    class TimerRDTSC
    {
      unsigned __int64  start_cycle;
      unsigned __int64  end_cycle;
    
    public:
      inline void Start()
      {
        start_cycle = RDTSC();
      }
    
      inline void Stop()
      {
        end_cycle = RDTSC();
      }
    
      unsigned __int64 Interval()
      {
        return end_cycle - start_cycle;
      }
    };
    
    class TimerPerformanceCounter
    {
      unsigned __int64  start_time;
      unsigned __int64  end_time;
    
    public:
      inline void Start()
      {
        QueryPerformanceCounter(reinterpret_cast<LARGE_INTEGER*>(&start_time));
      }
    
      inline void Stop()
      {
        QueryPerformanceCounter(reinterpret_cast<LARGE_INTEGER*>(&end_time));
      }
    
      unsigned __int64 Interval()
      {
        // Return duration in seconds...
        return end_time - start_time;
      }
    };
    
    template<class Timer, class Test, unsigned SleepRepeat, unsigned QuantumRepeat=1>
    class ProfileSpeed
    {
      unsigned __int64 test_interval;
      Timer            timer;
      Test             test;
    
      void QuantumTest()
      {
        unsigned i;
        Sleep(10);
        for (i=0; i < QuantumRepeat; ++i)
        {
          timer.Start();
          test.RunTest();
          timer.Stop();
          test_interval += timer.Interval();
        }
      }
    
    public:
      ProfileSpeed() : test_interval(0) {}
    
      void Run()
      {
        unsigned i;
        for (i=0; i < SleepRepeat; ++i)
        {
          QuantumTest();
        }
      }
    
      unsigned __int64 TestInterval() {return test_interval;}
    };
    
    class HelloTest
    {
    public:
      inline static void RunTest()
      {
        cout << "Hello world!" << endl;
      }
    };
    
    int main(int argc, char* argv[])
    {
      // Run HelloTest 50 times sleeping between each test, using TimerRDTSC
      ProfileSpeed<TimerRDTSC, HelloTest, 50> test1;
    
      // Run HelloTest 50 times sleeping between every 2 tests, using TimerRDTSC
      ProfileSpeed<TimerRDTSC, HelloTest, 25, 2> test2;
    
      // Switch test order if an argument is specified to the program
      if (argc == 1)
      {
        test1.Run();
        test2.Run();
      }
      else
      {
        test2.Run();
        test1.Run();
      }
    
      cout << "Test 1 Interval: " << unsigned(test1.TestInterval()) << endl;
      cout << "Test 2 Interval: " << unsigned(test2.TestInterval()) << endl;
    
      return 0;
    }

    Last edited by Andreas Masur; July 24th, 2005 at 05:39 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Azure Activities Information Page

Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center