Re: Factor

Factor: the language, the theory, and the practice.

Dragonbox

Sunday, February 11, 2024

#math

One of the challenging problems in computer science is to efficiently take a binary representation of a floating-point number and convert it to the “shortest decimal representation” that will roundtrip back to the same floating-point number when it is parsed.

A few days ago, one of the members of the Factor Discord server posted about an issue they were having where three separate floating-point numbers printed as the same decimal value:

IN: scratchpad 0x1.1ffffffffffffp7 .
144.0

IN: scratchpad 0x1.2p7 .
144.0

IN: scratchpad 0x1.2000000000001p7 .
144.0

Well, that’s not ideal!

And you can see that in other languages like Python, they parse properly into three distinct values:

>>> float.fromhex('0x1.1ffffffffffffp7')
143.99999999999997

>>> float.fromhex('0x1.2p7')
144.0

>>> float.fromhex('0x1.2000000000001p7')
144.00000000000003

In the process of investigating this issue, I re-discovered a few algorithms that have been developed to do this. There is a neat project called Drachennest that investigates the relative performance of several of these algorithms and claims:

Grisu3, Ryu, Schubfach, and Dragonbox are optimal, i.e. the output string

  1. rounds back to the input number when read in,
  2. is as short as possible,
  3. is as close to the input number as possible.

Well, it turns out that the “Dragonbox” algorithm is one of the current best and is described in a paper called A New Floating-Point Binary-to-Decimal Conversion Algorithm as well as a fantastic reference implementation of Dragonbox in C++.

I was able to quickly fix the bug by temporarily using a “modern formatting library” called {fmt} that works in C++11 and provides a version of the C++20 function std::format, but thought it would be a good idea to implement the Dragonbox algorithm someday in pure Factor code and filed an issue to track that idea.

Well, one of our awesome contributors, Giftpflanze, jumped in and implemented Dragonbox in Factor – providing a very readable and understandable and nicely concatenative version – and it was merged today!

Not only does this solve the issue of decimal representation of floats, but it provides quite a large speedup to our float-parsing benchmark:

Currently in Factor 0.99:

IN: scratchpad gc [ parse-float-benchmark ] time
Running time: 3.181906583 seconds

And now after the patch:

IN: scratchpad gc [ parse-float-benchmark ] time
Running time: 0.378132792 seconds

Very impressive!

I’m excited to say that this is now available in the development version of Factor.