C++ Profiling: How do I determine the speed of a particular function or operation?
Q: How do I determine the speed of a particular function or operation?
A: Determining the speed of a particular operation is often known as profiling. The term "profiling" can also be used when other information about an operation's profile is queried -- such as the number of calls to a function. But for the purpose of this FAQ, we'll focus on determing the speed of a particular operation.
Part I: How to Profile
The best solution for profiling an operation is to use a profiler. Visual Studio has a good profiler. Linux has gprof (http://linuxgazette.net/100/vinayak.html). If your compiler doesn't have a profiler, it might be worthwhile purchasing a compiler that does if you will often be profiling your code.
If you have to get by without using a professional profiler, then you can usually get by embedding your own into your program. It may not always be as accurate or easy to use as a professional profiler, but then again, you get what you pay for. ;)
To profile an operation, you will need a high-precision timer. Pentium-compatible processors have an RDTSC (ReaD Time Stamp Counter) assembly instruction that will report the current processor tick. Windows operating systems usually support the QueryPerformanceCounter() WinAPI call. Some computers, usually laptops and handhelds, vary their clock frequency to balance CPU usage and battery life. For these computers, RDTSC and QueryPerformanceCounter() will return different information. RDTSC will return the number of CPU cycles it took to complete an operation whereas the QueryPerformanceCounter() will return the total time it took to complete the operation. It is up to you to determine which is the more important piece of information for you.
On multi-threading operating systems, one must be careful when profiling that other threads do not interrupt the timing of your operation. Interrupted tests will produce misleading results. For many operating systems, the best way to tackle this is to put your thread to sleep right before your test starts. That way your test will have a full thread quantum to execute the test.
Below you will find some code useful for writing your own profiler. The examples are independent of each other - each is a seperate one.
Part II: Good Profiling Habits
Now that you know how to profile an operation, you should also practice good profiling habits. For example, if you are testing how fast a database query takes, one individual query may take a long time while another does not. Even then if the queries were identical the speed of the query will depend on the load of the database server. For this reason, it is good practice to:
- Profile your operation with different parameters. For the database query example, you should use multiple different queries for a given test, that way you gain knowledge to the overall performance.
- Profile your operation with unique parameters. For the database query example, you may find one particular query that takes a really long time. If this is the case, you can investigate to find out why.
- Profile under controlled conditions. Often this means running your tests on minimally burdened systems. For the database query example, you will want to make sure that the your application is the only one connected to the database server and that as few other processes are running as possible. This way, your results will more closely represent true performance -- your profiling results will also be more reproducible.
- Profile multiple times -- not just once. How reproducible your results are will often tell you a lot about your operation and/or test.
Thanks Mick and Yves for your help with this FAQ.