Bit Test
Friday, June 19, 2015
Factor has a
bit?
generic that is used to test an integer to see if a particular bit is
set. When operating on
fixnum
integers, this is implemented by the fixnum-bit?
word:
: fixnum-bit? ( x n -- ? )
{ fixnum fixnum } declare
dup 0 >= [ neg shift even? not ] [ 2drop f ] if
Many CPU architectures have instructions for performing a Bit Test. On x86, the BT instruction is available. Below, we are going to implement a compiler intrinsic in the hopes of speeding up this operation in Factor.
Intrinsic
In
cpu.architecture,
add a generic %bit-test
based on our CPU:
HOOK: %bit-test cpu ( dst src1 src2 temp -- )
In cpu.x86,
implement %bit-test
on x86 (returning a boolean using a temporary
register and the CMOVB
instruction to check the
carry flag which holds the
result of the bit test):
M:: x86 %bit-test ( dst src1 src2 temp -- )
src1 src2 BT
dst temp \ CMOVB (%boolean) ;
In
compiler.cfg.instructions,
add a ##bit-test
instruction:
FOLDABLE-INSN: ##bit-test
def: dst/tagged-rep
use: src1/int-rep src2/int-rep
temp: temp/int-rep ;
In
compiler.codegen,
we link the ##bit-test
instruction with the %bit-test
word.
CODEGEN: ##bit-test %bit-test
In
compiler.cfg.intrinsics,
enable replacing fixnum-bit?
with the %bit-test
intrinsic:
: enable-bit-test ( -- )
{
{ fixnum-bit? [ drop [ ^^bit-test ] binary-op ] }
} enable-intrinsics ;
Disassemble
The old assembly using fixnum-bit?
looked like this:
000000010ea00250: mov [rip-0x1ff256], eax
000000010ea00256: sub rsp, 0x8
000000010ea0025a: call 0x10eedf3b0 (integer>fixnum-strict)
000000010ea0025f: mov rax, [r14]
000000010ea00262: cmp rax, 0x0
000000010ea00266: jl 0x10ea002a7 (M\ fixnum bit? + 0x57)
000000010ea0026c: neg rax
000000010ea0026f: mov [r14], rax
000000010ea00272: call 0x10ebec060 (fixnum-shift)
000000010ea00277: mov rax, [r14]
000000010ea0027a: test rax, 0x10
000000010ea00281: mov rax, 0x1
000000010ea0028b: mov rbx, 0x1010eff4c
000000010ea00295: cmovnz rax, rbx
000000010ea00299: mov [r14], rax
000000010ea0029c: mov [rip-0x1ff2a2], eax
000000010ea002a2: add rsp, 0x8
000000010ea002a6: ret
000000010ea002a7: sub r14, 0x8
000000010ea002ab: mov qword [r14], 0x1
000000010ea002b2: mov [rip-0x1ff2b8], eax
000000010ea002b8: add rsp, 0x8
000000010ea002bc: ret
The new assembly looks like this with BT:
000000010e656ec0: mov [rip-0x293ec6], eax
000000010e656ec6: sub rsp, 0x8
000000010e656eca: call 0x10e467ea0 (integer>fixnum-strict)
000000010e656ecf: mov rax, [r14]
000000010e656ed2: mov rbx, [r14-0x8]
000000010e656ed6: sar rbx, 0x4
000000010e656eda: sar rax, 0x4
000000010e656ede: bt rbx, rax
000000010e656ee2: mov rbx, 0x1
000000010e656eec: mov rcx, 0x1018f169c
000000010e656ef6: cmovb rbx, rcx
000000010e656efa: sub r14, 0x8
000000010e656efe: mov [r14], rbx
000000010e656f01: mov [rip-0x293f07], eax
000000010e656f07: add rsp, 0x8
000000010e656f0b: ret
Performance
Turns out that this speeds up fixnum-bit?
by a lot. Using a simple
test case:
: bench-fixnum-bit ( x n -- ? )
{ fixnum fixnum } declare
[ 0 100,000,000 ] 2dip
'[ _ _ bit? [ 1 + ] when ] times ;
Our old version was decently fast:
IN: scratchpad gc [ 0b101010101 0 bench-fixnum-bit ] time
Running time: 0.821433838 seconds
But our new version is much faster!
IN: scratchpad gc [ 0b101010101 0 bench-fixnum-bit ] time
Running time: 0.239439108 seconds
This has been committed to the development version and is available now.