I've been working on writing a memory bandwidth benchmark for a while and needed to use a monotonic timer to compute accurate timings. I have since learned that this is more challenging to do that I initially expected and each platform has a different way of doing it.
I was trying to determine the run time of a function and wanted the most
precise and accurate information possible.
First, I started by using
struct timeval before, after; gettimeofday(&before, NULL); function(); gettimeofday(&after, NULL);
Unfortunately, this will not always work since it is dependent on the system clock. If some other process changes the system time between the two calls to
gettimeofday, it could report inaccurate results. We need a function that returns a monotonically increasing value.
Luckily, such a function exists on Linux. We can use
struct timespec before, after, total; clock_gettime(CLOCK_MONOTONIC, &before); function(); clock_gettime(CLOCK_MONOTONIC, &after);
Unfortunately, this doesn't work everywhere! Each platform has its own way
accessing a high resolution monotonic counter. On Mac OS X we use
and on Windows we use
On x86 machines where none of these are available, we can resort directly to
rdtsc. This is a special instruction that returns the Time Stamp Counter, the number of cycles since reset. Unfortunately, we have to be very careful when using this instruction. This white paper offers a lot of good advice on how to use it, but in short we have to take care to prevent instruction reordering. In the following code, the reordering of the
fdiv after the
rdtsc would lead to inaccurate timing results:
rdtsc fdiv # Or another slow instruction rdtsc
rdtscp prevents instructions that occur before the
rdtsc from being reordered afterwards. Unfortunately, instructions that occur after the
rdtscp can still be reordered before it. The following code could have
fdiv reordered before the
rdtscp, leading to inaccurate results:
rdtscp call function rdtscp fdiv
The suggested way to avoid the reordering is to use the
cpuid instruction, which has the effect of preventing all instruction reordering around it. While this is a slow instruction, we can be a bit clever and ensure that we never have to execute it while between the times when we query the counter.
The ideal timing code looks something like this:
cpuid rtdsc # Save %edx and %eax (the output of rtdsc). call function rdtscp # Save %edx and %eax. cpuid
A cross platform timer
Assembling all this information, I attempted to write a cross-platform utility for fine grained timing. A few late nights and a file full of
#ifdefs later, I have the start of such a utility. Currently, it supports the function
monotonic_seconds which returns the seconds from some unspecified start point as a double precision floating point number. In the future, I'll add support for
monotonic_cycles as a static inline function in the header and
cycles_to_seconds as a way to convert cycles to seconds. Check it out here!