Fast Now
Sunday, March 6, 2011
Sometimes profiling an application will show hotspots in the oddest
places. For some types of network applications that process huge volumes
of events, calls to
gettimeofday()
can become a
bottleneck. If each event needs to have a timestamp generated, this
could mean thousands of system
calls in a very short time,
all producing essentially the same value. If the “actual” time is not so
important, performance could be gained by relaxing the requirement that
all events received in a select()
loop have the same timestamp.
One way to do this is to cache the timestamp after waking up, and use
that timestamp for all events received within the I/O loop.
Unfortunately, the I/O loop can take a long time to process - resulting
in timestamps that diverge from “actual” time by small or large (and
unpredictable) amounts. Perhaps a better way would be to cache the
result of gettimeofday()
with a resolution of one millisecond.
fast-now
In Factor, gettimeofday()
is called by
the now
word from the calendar
vocabulary. Let’s try and build a
fast-now
word that adds caching:
USING: calendar kernel math namespaces system ;
Our cache resolution is one millisecond (or 1,000,000 nanoseconds):
CONSTANT: cache-duration 1000000
We will be keeping the cached value and the expiration time:
SYMBOL: cache-value
SYMBOL: cache-until
Given the current time in nanoseconds, we can check to see if the cached value has expired:
: cache-expired? ( nanos -- ? )
cache-until get-global 0 or > ;
If it has, we can reset the cache expiration:
: reset-cache ( nanos -- )
cache-duration + cache-until set-global ;
And update the cached value (the result of calling now
):
: update-cache ( nanos -- timestamp )
reset-cache now [ cache-value set-global ] keep ;
Building the fast-now
word is as easy as:
- Get the current monotonically increasing
nano-count
. - Check if the cache has expired.
- Update the cache if expired, otherwise return the cached value.
: fast-now ( -- timestamp )
nano-count dup cache-expired? [ update-cache ] [
drop cache-value get-global
] if ;
Note: we use a monotonic
timer
because it is a much faster operation than calling gettimeofday()
.
Another fast way would be to use the instruction
counter,
however we would need to estimate cpu
speed
to know how many instructions the cached value should survive for.
Try It
We can try this out, and see how much faster it is:
IN: scratchpad [ 1000 [ now drop ] times ] benchmark .
19706313
IN: scratchpad [ 1000 [ fast-now drop ] times ] benchmark .
356574
So, 1,000 calls to now
took 19.706 milliseconds, and only 0.356
milliseconds for fast-now
(for a speedup of 55x)! In the case of a
single call, both now
and fast-now
take roughly the same time. Not a
bad improvement, right?
The code (and some tests) for this is on my GitHub.