Transducers
Monday, June 3, 2024
One of the distinct elements of Clojure that gets discussed on occasion are Clojure Transducers (not to be confused with Clojure Reducers. Specifically, transducers are a type of transformation that allows for composable algorithms using two primary concepts:
A reducing function is the kind of function you’d pass to reduce - it is a function that takes an accumulated result and a new input and returns a new accumulated result, with a reducing function signature of:
whatever, input -> whatever
A transducer (sometimes referred to as xform or xf) is a transformation from one reducing function to another, with a transducer signature of:
(whatever, input -> whatever) -> (whatever, input -> whatever)
In Clojure, the transducer is defined to have 3 different arity functions:
Init (arity 0) - should call the init arity on the nested transform
rf
, which will eventually call out to the transducing process.Step (arity 2) - this is a standard reduction function but it is expected to call the
rf
step arity 0 or more times as appropriate in the transducer. For example, filter will choose (based on the predicate) whether to callrf
or not. map will always call it exactly once. cat may call it many times depending on the inputs.Completion (arity 1) - some processes will not end, but for those that do (like transduce), the completion arity is used to produce a final value and/or flush state. This arity must call the
rf
completion arity exactly once.
Similarly to Haskell’s stream fusion, this allows for a benefit of eliding intermediate sequences when multiple operations are applied. However, in some ways, they are also quite a bit more powerful.
I wanted to discuss one potential way to build a transducer-like thing in Factor.
Building a transducing function
Our reducing functions are any defined as ( prev elt -- next )
– used
with reduce.
Our transducing functions have a similar stack effect, but with several rules:
- if
elt
isnull
, skip applying the reducing function - if
next
isreduced
, we can early exit and return it - if
next
isnull
, we keep the previous result
Note: we have a simple
reduced
wrapper type to indicate an early exit returnsobj
:TUPLE: reduced obj ;
We can build a word that converts a reducing function to a transducing function:
: xf ( rf: ( prev elt -- next ) -- xf )
'[ { [ over reduced? ] [ dup null eq? ] } 0|| [ drop ] _ if ] ;
A common transducing function is a mapping operation:
: xmap ( xf quot: ( elt -- newelt ) -- xf' )
'[ @ dup { [ reduced? ] [ null eq? ] } 1|| _ unless ] xf ;
Building transduce
That’s enough definition for us to build our (transduce)
word, operating on
an identity
and then applying the transducing function until a result is
achieved.
: (transduce) ( seq identity xf: ( prev elt -- next ) -- result )
swapd '[
_ keepd over null eq? [ nip f ] [ drop dup reduced? ] if
] find 2drop dup reduced? [ obj>> ] when ; inline
And, for convenience we build ourselves a transduce
word that takes a
quotation, which
is called to compose a transducing function (using
make to build up an
initialize quotation) and returning a step quotation, and then returning a
block of code that calls (transduce)
with the null
identity.
MACRO: transduce ( quot: ( xf -- xf' ) -- result )
[ [ nip ] swap call ] [ ] make swap '[ @ null _ (transduce) ] ;
Note: currently this does not implement a complete quotation, but perhaps some future transducing function will require that and we can add it.
This word only operates on sequences, but one could imagine adding support to apply transducers to other kinds of streams of objects including infinite lazy lists.
Building transducers
Using this logic, we can build a bunch of useful transducers with a small amount of code.
- A filter transducer skips certain values:
: xfilter ( xf quot: ( elt -- ? ) -- xf' )
'[ dup @ [ drop null ] unless ] xmap ;
- A sum transducer adds up a series of numbers:
: xsum ( xf -- xf' )
[let f :> n! [ 0 n! ] % [ n + n! n ] xmap ] ;
- A prettyprint transducer prints out intermediate values for debugging:
: xpprint ( xf -- xf' ) [ dup . ] xmap ;
- A collector transducer collects all results into a vector:
: xcollect ( xf -- xf' )
[let f :> v! [ V{ } clone v! ] % [ v [ push ] keep ] xmap ] ;
- A histogram transducer counts values:
: xhistogram ( xf -- xf' )
[let f :> h! [ H{ } clone h! ] % [ h [ inc-at ] keep ] xmap ] ;
- A unique transducer passes through only unique values:
: xunique ( xf -- xf' )
[let f :> s! [ HS{ } clone s! ] %
'[ [ s ?adjoin ] keep null ? ] xmap
] ;
And a lot more…
Performance
Some quick performance tests will show the benefit of fusing multiple
operations together. The artificial test case is going to be a word that
operates on a sequence, returning the sum of the
square of prime
numbers,
if it is bigger than 5,000
.
- Using normal sequence operations:
: foo ( seq -- n )
[ prime? ] filter
[ sq ] map
[ 5,000 > ] filter
sum ;
- Using our new transducers:
: bar ( seq -- n )
[
[ prime? ] xfilter
[ sq ] xmap
[ 5,000 > ] xfilter
xsum
] transduce ;
And then see how it performs!
IN: scratchpad 1,000,000 100 randoms
[ [ foo ] time ]
[ [ bar ] time ] bi assert=
Running time: 0.037218208 seconds
Running time: 0.028653417 seconds
In this case, it’s about 25% faster. Depending on your use-case and how many intermediate sequence operations the transducer is able to eliminate, it could represent a nice speedup as well as potentially a more elegant method.
Some things we can improve upon:
- provide length hints for cases where we are doing map operations
- change the
reduced
andnull
logic to simplify transducer implementations - investigate using
with-return
to early exit instead of
reduced
objects - consider doing initialization differently than using local variables
- adding completion logic to do things like free up memory at the end of computation
One of our contributors that uses Codewars experimented with a different approach to transducer words in Factor that is worth exploring as well.
This is available on my GitHub.