Estimating CPU Speed
Monday, November 29, 2010
Factor contains a nice DSL for writing assembly code. I thought it would be fun to investigate how it works by accessing the CPU’s Time Stamp Counter to estimate CPU speed.
The X86 instruction for accessing the timestamp value (incremented every
CPU tick) is called RDTSC (a 2-byte instruction 0x0f 0x31
). Some C
code for calling the 32-bit or 64-bit versions of this looks like:
#if defined(__i386__)
static __inline__ unsigned long long rdtsc(void)
{
unsigned long long int x;
__asm__ __volatile__ (".byte 0x0f, 0x31" : "=A" (x));
return x;
}
#elif defined(__x86_64__)
static __inline__ unsigned long long rdtsc(void)
{
unsigned long long hi, lo;
__asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}
#endif
Factor provides utilities for calling arbitrary assembly code in the alien vocabulary. Using this, we can create corresponding Factor code (supporting both 32 and 64 bits):
USING: alien alien.c-types cpu.x86.assembler
cpu.x86.assembler.operands system ;
HOOK: rdtsc cpu ( -- n )
M: x86.32 rdtsc
longlong { } cdecl [
RDTSC
] alien-assembly ;
M: x86.64 rdtsc
longlong { } cdecl [
RAX 0 MOV
RDTSC
RDX 32 SHL
RAX RDX OR
] alien-assembly ;
You can see in the implementation above how Factor uses the type of CPU
(contained in the cpu
variable) to dispatch on the correct version of
the rdtsc
word.
To estimate CPU speed, we will need to define two “benchmarking” words:
- Calculate the number of CPU ticks it takes to execute some Factor code.
- Calculate the time it takes to execute some Factor code.
USING: kernel math system ;
: #ticks ( quot -- n )
rdtsc [ call rdtsc ] dip - ; inline
: #nanos ( quot -- n )
nano-count [ call nano-count ] dip - ; inline
We can then create a “busy loop” that runs for some time, then estimates CPU speed as ticks-per-second:
: busy-loop ( -- )
100000000 [ 1 - dup 0 > ] loop drop ;
: cpu-speed ( -- n )
[ [ busy-loop ] #nanos ] #ticks swap / 1000000000.0 * ;
Running this on my MacBook Pro (with a 2.66 GHz processor) produces this estimate:
IN: scratchpad cpu-speed .
2660324566.190773
The rdtsc
and #ticks
words are distributed with Factor as
instruction-count
and count-instructions
and are available in the
cpu.x86.features
vocabulary. The #nanos
word is called benchmark
and is available in the tools.time
vocabulary.