Bend
Wednesday, June 5, 2024
The Bend programming language is a massively parallel, high-level programming language. It has some really neat dataflow concepts that make it inherently parallel because it infers the flow of variables and then naturally performs divide and conquer to utilize all available computing resources, both CPU and GPU.
This results in some impressive speedups:
Yet, since it uses a divide-and-conquer approach, which is inherently parallel, Bend will run it multi-threaded. Some benchmarks:
CPU, Apple M3 Max, 1 thread: 12.15 seconds
CPU, Apple M3 Max, 16 threads: 0.96 seconds
GPU, NVIDIA RTX 4090, 16k threads: 0.21 seconds
That’s a 57x speedup by doing nothing.
One of our current challenges in Factor is that we use green threads and are effectively single-threaded for computations. We have a non-blocking I/O implementation as well as integration with UI event loops, so it ends up feeling more concurrent than it actually is.
We hope to solve that in the future, both by supporting native threads and also by a multi-vm approach that is somewhat equivalent to the Python multiprocessing module. Some early ideas have been contributed towards this goal, but so far it is not complete.
Bend Example
One of the Bend examples that we are going to try and implement in
Factor uses a bend
and fork
operations which
creates concurrency points in a kind of fork-join
model:
# given a shader, returns a square image
def render(depth, shader):
bend d = 0, i = 0:
when d < depth:
color = (fork(d+1, i*2+0), fork(d+1, i*2+1))
else:
width = depth / 2
color = shader(i % width, i / width)
return color
# given a position, returns a color
# for this demo, it just busy loops
def demo_shader(x, y):
bend i = 0:
when i < 5000:
color = fork(i + 1)
else:
color = 0x000001
return color
# renders a 256x256 image using demo_shader
def main:
return render(16, demo_shader)
Factor Syntax
A few days ago, Keldan Chapman contributed an early version of a bend vocabulary. And then, we spent some time working together on the Factor Discord server on some alternative implementations, one of which I want to go over below.
We are going to create a BEND[
quotation that provides support for a
fork
word that implicitly recurses (but not currently through CPU or GPU
parallelism) and then joins the results together. We create an uninterned
word to hold our
computation and use
with-words
to make the fork
word only valid in the parsing scope to recurse.
SYNTAX: BEND[
gensym dup
dup "fork" associate [ parse-quotation ] with-words
dup infer define-declared suffix! ;
A simple use-case of this might be to compute the factorial through recursion:
: factorial ( n -- n! )
BEND[ dup 1 > [ dup 1 - fork * ] when ] ;
Or, perhaps computing Fibonacci numbers through recursion:
: fibonacci ( n -- fib )
BEND[ dup 1 > [ [ 1 - fork ] [ 2 - fork ] bi + ] when ] ;
Factor Example
With that syntax defined, we can now translate the above example to the following form:
: render ( depth shader -- color )
0 0 BEND[
[let :> ( depth shader d i )
d depth < [
depth shader d 1 + i 2 * fork
depth shader d 1 + i 2 * 1 + fork
2array
] [
i depth 2/ /mod swap shader call( x y -- color )
] if
]
] ;
:: demo-shader ( x y -- color )
0 BEND[ dup 5000 < [ 1 + fork ] [ drop 0x000001 ] if ] ;
: main ( -- color )
16 [ demo-shader ] render ;
Obviously, this fork
doesn’t (currently) increase performance, and it might
shadow the
fork from
the unix.process
vocabulary, but
it could represent the start of a new computational approach in Factor.
I can’t wait to see more from the Bend programming language and I also hope to see more of these ideas appearing and improving in Factor!