Re: Factor

Factor: the language, the theory, and the practice.

Proquint

Tuesday, February 7, 2023

#parsing

A few days ago, Ciprian Dorin Craciun wrote a binary to text encoding blog post about the “state of the art and missed opportunities” in various encoding schemes. In that post, I was introduced to the Proquint encoding which stands for “PRO-nouncable QUINT-uplets”.

In the Factor programming language, we have enjoyed implementing many encoding/decoding methods including: base16, base24, base32, base32hex, base32-crockford, base36, base58, base62, base64, base85, base91, uu, and many others. I thought it would be fun to add a quick implementation of Proquint.

Like other encodings, it makes use of an alphabet – grouped as consonants and vowels:

CONSTANT: consonant "bdfghjklmnprstvz"

CONSTANT: vowel "aiou"

Numbers are grouped into 5-character blocks representing a 16-bit number, with alternating consonants representing 4 bits and vowels representing 2 bits:

: >quint16 ( m -- str )
    5 [
        even? [
            [ -4 shift ] [ 4 bits consonant nth ] bi
        ] [
            [ -2 shift ] [ 2 bits vowel nth ] bi
        ] if
    ] "" map-integers-as reverse nip ;

Encoding a 32-bit number is made by joining two 16-bit blocks:

: >quint32 ( m -- str )
    [ -16 shift ] keep [ 16 bits >quint16 ] bi@ "-" glue ;

Decoding numbers looks up each consonant or vowel, skipping separators:

: quint> ( str -- m )
    0 [
        dup $[ consonant alphabet-inverse ] nth [
            nip [ 4 shift ] [ + ] bi*
        ] [
            dup $[ vowel alphabet-inverse ] nth [
                nip [ 2 shift ] [ + ] bi*
            ] [
                CHAR: - assert=
            ] if*
        ] if*
    ] reduce ;

We can use this to make a random password that might be more memorable – but perhaps more secure if using more random-bits:

: quint-password ( -- quint )
    32 random-bits >quint32 ;

And we could use our ip-parser vocabulary to make IPv4 addresses more memorable:

: ipv4>quint ( ipv4 -- str )
    ipv4-aton >quint32 ;

: quint>ipv4 ( str -- ipv4 )
    quint> ipv4-ntoa ;

You can see how this might work by building a test suite to show roundtrips work:

{ t } [
    {
        { "127.0.0.1"       "lusab-babad" }
        { "63.84.220.193"   "gutih-tugad" }
        { "63.118.7.35"     "gutuk-bisog" }
        { "140.98.193.141"  "mudof-sakat" }
        { "64.255.6.200"    "haguz-biram" }
        { "128.30.52.45"    "mabiv-gibot" }
        { "147.67.119.2"    "natag-lisaf" }
        { "212.58.253.68"   "tibup-zujah" }
        { "216.35.68.215"   "tobog-higil" }
        { "216.68.232.21"   "todah-vobij" }
        { "198.81.129.136"  "sinid-makam" }
        { "12.110.110.204"  "budov-kuras" }
    } [
        [ quint>ipv4 = ] [ swap ipv4>quint = ] 2bi and
    ] assoc-all?
] unit-test

This is now available as the proquint vocabulary in a recent nightly build.