Re: Factor

Factor: the language, the theory, and the practice.

Compressed Images

Monday, May 27, 2024

A recent contribution by @nomennescio enables support for loading compressed images in Factor!

I’m not talking about graphical images, but rather about the binary image that Factor uses to load from. Specifically, the binary image includes mainly the data and code heaps as well as some special objects that are used to initialize the Factor libraries.

The compressed image support uses the image_header to communicate that a newer compressed version of the Factor binary image should be loaded, instead of an uncompressed one. We currently use the Zstandard compression method, which offers a reasonable balance of speed and compressibility.


The released Factor binary image containing a reasonable default list of vocabularies to be loaded is around 127 megabytes (compressed to 20 megabytes).

127M    factor.image
20M     factor.image.compressed

One of the criticisms that we have received in the past is that a load-all image that loads the over 300,000 lines of Factor code in the main Factor repository can be almost 500 megabytes. While compressed, that gets significantly reduced down to 66 megabytes!

483M    factor.load-all.image
66M     factor.load-all.image.compressed


This is not without some cost: there is a small runtime delay when starting the Factor binary using a compressed image. For example, we can compare uncompressed and compressed results of loading a default image and doing nothing:

$ time ./factor -i=factor.image -e=""

real    0m0.105s
user    0m0.048s
sys     0m0.057s

$ time ./factor -i=factor.image.compressed -e=""

real    0m0.281s
user    0m0.230s
sys     0m0.050s

Or compare the results when using a load-all image:

$  time ./factor -i=factor.load-all.image -e=""

real    0m0.515s
user    0m0.258s
sys     0m0.257s

$ time ./factor -i=factor.load-all.image.compressed -e=""

real    0m1.042s
user    0m0.809s
sys     0m0.233s

That is not quite an apples-to-apples comparison, as the uncompressed version uses mmap and likely does not fully cache or page it all in, but the uncompressed image is fully uncompressed. However, it gives you a sense of where this feature is heading.


If you run "hello-world" deploy you can create a relatively small deployed binary that prints Hello world when run. This can then be compressed manually, to see the difference in size (~25%) with negligible differences in runtime:

$ du -h hello-world*
1.8M    hello-world
1.3M    hello-world-compressed

$ time ./hello-world
Hello world

real    0m0.005s
user    0m0.001s
sys     0m0.004s

$ time ./hello-world-compressed
Hello world

real    0m0.005s
user    0m0.001s
sys     0m0.003s

Some additional work needs to be done to add support in the deploy tools for a checkbox to create binaries using compression, however this already represents a big win for anyone that’s more concerned about file sizes than startup latency.

Compression is currently supported using the tools.image.compressor vocabulary and uncompression using the tools.image.uncompressor vocabulary. This is a new feature and might change as it evolves, but this is a neat preview of things to come in the next release.

Give it a try!