Bend

Wednesday, June 5, 2024

The Bend programming language is a massively parallel, high-level programming language. It has some really neat dataflow concepts that make it inherently parallel because it infers the flow of variables and then naturally performs divide and conquer to utilize all available computing resources, both CPU and GPU.

This results in some impressive speedups:

Yet, since it uses a divide-and-conquer approach, which is inherently parallel, Bend will run it multi-threaded. Some benchmarks:

CPU, Apple M3 Max, 1 thread: 12.15 seconds

CPU, Apple M3 Max, 16 threads: 0.96 seconds

GPU, NVIDIA RTX 4090, 16k threads: 0.21 seconds

That’s a 57x speedup by doing nothing.

One of our current challenges in Factor is that we use green threads and are effectively single-threaded for computations. We have a non-blocking I/O implementation as well as integration with UI event loops, so it ends up feeling more concurrent than it actually is.

We hope to solve that in the future, both by supporting native threads and also by a multi-vm approach that is somewhat equivalent to the Python multiprocessing module. Some early ideas have been contributed towards this goal, but so far it is not complete.

Bend Example

One of the Bend examples that we are going to try and implement in Factor uses a bend and fork operations which creates concurrency points in a kind of fork-join model:

# given a shader, returns a square image
def render(depth, shader):
  bend d = 0, i = 0:
    when d < depth:
      color = (fork(d+1, i*2+0), fork(d+1, i*2+1))
    else:
      width = depth / 2
      color = shader(i % width, i / width)
  return color

# given a position, returns a color
# for this demo, it just busy loops
def demo_shader(x, y):
  bend i = 0:
    when i < 5000:
      color = fork(i + 1)
    else:
      color = 0x000001
  return color

# renders a 256x256 image using demo_shader
def main:
  return render(16, demo_shader)

Factor Syntax

A few days ago, Keldan Chapman contributed an early version of a bend vocabulary. And then, we spent some time working together on the Factor Discord server on some alternative implementations, one of which I want to go over below.

We are going to create a BEND[ quotation that provides support for a fork word that implicitly recurses (but not currently through CPU or GPU parallelism) and then joins the results together. We create an uninterned word to hold our computation and use with-words to make the fork word only valid in the parsing scope to recurse.

SYNTAX: BEND[
    gensym dup
    dup "fork" associate [ parse-quotation ] with-words
    dup infer define-declared suffix! ;

A simple use-case of this might be to compute the factorial through recursion:

: factorial ( n -- n! )
    BEND[ dup 1 > [ dup 1 - fork * ] when ] ;

Or, perhaps computing Fibonacci numbers through recursion:

: fibonacci ( n -- fib )
    BEND[ dup 1 > [ [ 1 - fork ] [ 2 - fork ] bi + ] when ] ;

Factor Example

With that syntax defined, we can now translate the above example to the following form:

: render ( depth shader -- color )
    0 0 BEND[
        [let :> ( depth shader d i )
            d depth < [
                depth shader d 1 + i 2 * fork
                depth shader d 1 + i 2 * 1 + fork
                2array
            ] [
                i depth 2/ /mod swap shader call( x y -- color )
            ] if
        ]
    ] ;

:: demo-shader ( x y -- color )
    0 BEND[ dup 5000 < [ 1 + fork ] [ drop 0x000001 ] if ] ;

: main ( -- color )
    16 [ demo-shader ] render ;

Obviously, this fork doesn’t (currently) increase performance, and it might shadow the fork from the unix.process vocabulary, but it could represent the start of a new computational approach in Factor.

I can’t wait to see more from the Bend programming language and I also hope to see more of these ideas appearing and improving in Factor!