Sunday, March 6, 2011
Sometimes profiling an application will show hotspots in the oddest
places. For some types of network applications that process huge volumes
of events, calls to
gettimeofday() can become a
bottleneck. If each event needs to have a timestamp generated, this
could mean thousands of system
calls in a very short time,
all producing essentially the same value. If the “actual” time is not so
important, performance could be gained by relaxing the requirement that
all events received in a
loop have the same timestamp.
One way to do this is to cache the timestamp after waking up, and use
that timestamp for all events received within the I/O loop.
Unfortunately, the I/O loop can take a long time to process - resulting
in timestamps that diverge from “actual” time by small or large (and
unpredictable) amounts. Perhaps a better way would be to cache the
gettimeofday() with a resolution of one millisecond.
gettimeofday() is called by
now word from the
calendar vocabulary. Let’s try and build a
fast-now word that adds caching:
USING: calendar kernel math namespaces system ;
Our cache resolution is one millisecond (or 1,000,000 nanoseconds):
CONSTANT: cache-duration 1000000
We will be keeping the cached value and the expiration time:
SYMBOL: cache-value SYMBOL: cache-until
Given the current time in nanoseconds, we can check to see if the cached value has expired:
: cache-expired? ( nanos -- ? ) cache-until get-global 0 or > ;
If it has, we can reset the cache expiration:
: reset-cache ( nanos -- ) cache-duration + cache-until set-global ;
And update the cached value (the result of calling
: update-cache ( nanos -- timestamp ) reset-cache now [ cache-value set-global ] keep ;
fast-now word is as easy as:
- Get the current monotonically increasing
- Check if the cache has expired.
- Update the cache if expired, otherwise return the cached value.
: fast-now ( -- timestamp ) nano-count dup cache-expired? [ update-cache ] [ drop cache-value get-global ] if ;
Note: we use a monotonic
because it is a much faster operation than calling
Another fast way would be to use the instruction
however we would need to estimate cpu
to know how many instructions the cached value should survive for.
We can try this out, and see how much faster it is:
IN: scratchpad [ 1000 [ now drop ] times ] benchmark . 19706313 IN: scratchpad [ 1000 [ fast-now drop ] times ] benchmark . 356574
So, 1,000 calls to
now took 19.706 milliseconds, and only 0.356
fast-now (for a speedup of 55x)! In the case of a
single call, both
fast-now take roughly the same time. Not a
bad improvement, right?
The code (and some tests) for this is on my GitHub.