Re: Factor

Factor: the language, the theory, and the practice.

Humanhash

Thursday, December 5, 2013

#hash

Zachary Voase published the humanhash project on GitHub, making “human-readable representations of digests”. Below is a compatible implementation in Factor.

To show how it will work, we use the example from his humanhash README:

IN: scratchpad CONSTANT: digest "7528880a986c40e78c38115e640da2a1"

IN: scratchpad digest humanhash .
"three-georgia-xray-jig"

IN: scratchpad digest 6 humanhash-words .
"high-mango-white-oregon-purple-charlie"

IN: scratchpad human-uuid4 2array .
{
    "28129036-75a7-4c87-984b-4b32231e0a0d"
    "nineteen-bluebird-oxygen-edward"
}

Implementation

We need a list of 256 words, one to represent each possible byte:

CONSTANT: default-wordlist {
    "ack" "alabama" "alanine" "alaska" "alpha" "angel" "apart"
    "april" "arizona" "arkansas" "artist" "asparagus" "aspen"
    "august" "autumn" "avocado" "bacon" "bakerloo" "batman" "beer"
    "berlin" "beryllium" "black" "blossom" "blue" "bluebird" "bravo"
    "bulldog" "burger" "butter" "california" "carbon" "cardinal"
    "carolina" "carpet" "cat" "ceiling" "charlie" "chicken" "coffee"
    "cola" "cold" "colorado" "comet" "connecticut" "crazy" "cup"
    "dakota" "december" "delaware" "delta" "diet" "don" "double"
    "early" "earth" "east" "echo" "edward" "eight" "eighteen"
    "eleven" "emma" "enemy" "equal" "failed" "fanta" "fifteen"
    "fillet" "finch" "fish" "five" "fix" "floor" "florida"
    "football" "four" "fourteen" "foxtrot" "freddie" "friend"
    "fruit" "gee" "georgia" "glucose" "golf" "green" "grey" "hamper"
    "happy" "harry" "hawaii" "helium" "high" "hot" "hotel"
    "hydrogen" "idaho" "illinois" "india" "indigo" "ink" "iowa"
    "island" "item" "jersey" "jig" "johnny" "juliet" "july"
    "jupiter" "kansas" "kentucky" "kilo" "king" "kitten" "lactose"
    "lake" "lamp" "lemon" "leopard" "lima" "lion" "lithium" "london"
    "louisiana" "low" "magazine" "magnesium" "maine" "mango" "march"
    "mars" "maryland" "massachusetts" "may" "mexico" "michigan"
    "mike" "minnesota" "mirror" "mississippi" "missouri" "mobile"
    "mockingbird" "monkey" "montana" "moon" "mountain" "muppet"
    "music" "nebraska" "neptune" "network" "nevada" "nine"
    "nineteen" "nitrogen" "north" "november" "nuts" "october" "ohio"
    "oklahoma" "one" "orange" "oranges" "oregon" "oscar" "oven"
    "oxygen" "papa" "paris" "pasta" "pennsylvania" "pip" "pizza"
    "pluto" "potato" "princess" "purple" "quebec" "queen" "quiet"
    "red" "river" "robert" "robin" "romeo" "rugby" "sad" "salami"
    "saturn" "september" "seven" "seventeen" "shade" "sierra"
    "single" "sink" "six" "sixteen" "skylark" "snake" "social"
    "sodium" "solar" "south" "spaghetti" "speaker" "spring"
    "stairway" "steak" "stream" "summer" "sweet" "table" "tango"
    "ten" "tennessee" "tennis" "texas" "thirteen" "three" "timing"
    "triple" "twelve" "twenty" "two" "uncle" "undress" "uniform"
    "uranus" "utah" "vegan" "venus" "vermont" "victor" "video"
    "violet" "virginia" "washington" "west" "whiskey" "white"
    "william" "winner" "winter" "wisconsin" "wolfram" "wyoming"
    "xray" "yankee" "yellow" "zebra" "zulu"
}

One of the inputs is the number of words to produce. An error is produced if fewer bytes are provided than the number of words requested:

ERROR: too-few-bytes seq #words ;

: check-bytes ( seq #words -- seq #words )
    2dup [ length ] [ < ] bi* [ too-few-bytes ] when ; inline

Input is grouped into subsequences, where the number of subsequences is the number of words requested in the output. It’s a little bit odd, but basically makes groups and then puts any remainder in the last group:

: group-words ( seq #words -- groups )
    [ dupd [ length ] [ /i ] bi* group ]
    [ 1 - cut concat suffix ] bi ; inline

Our input bytes are compressed, first by grouping them into words, then by XORing the bytes in each word:

: compress-bytes ( seq #words -- newseq )
    check-bytes group-words [ 0 [ bitxor ] reduce ] map ;

Our input will either be a byte-array, or a hex-string with every two characters representing the hexadecimal value of each byte:

: byte-string ( hexdigest -- seq )
    dup byte-array? [ 2 <groups> [ hex> ] map ] unless ;

Making a humanhash is simply converting the input, compressing into bytes representing each word, looking up the word from the word list, and joining with a requested separator:

: make-humanhash ( hexdigest #words wordlist sep -- hash )
    { [ byte-string ] [ compress-bytes ] [ nths ] [ join ] } spread ;

We provide a way to hash into a requested number of words, or four by default:

: humanhash-words ( hexdigest #words -- hash )
    default-wordlist "-" make-humanhash ;

: humanhash ( hexdigest -- hash )
    4 humanhash-words ;

And since the humanhash project includes a way to create humanhash’d uuids, we do also:

: human-uuid4 ( -- uuid hash )
    uuid4 dup [ CHAR: - = not ] filter humanhash ;