Re: Factor

Factor: the language, the theory, and the practice.

Typed Netstrings

Sunday, March 20, 2011

#parsing

A few hours ago, Zed Shaw tweeted about an experiment creating typed netstrings:

An experiment in tagging netstrings with the types of their data as an alternative to JSON: https://codepad.org/xct0E5acless than a minute ago via web

I thought that an implementation in Factor would contrast nicely with Zed’s Python version. The basic idea is to support four kinds of data types:

  • Text
  • Numbers
  • Lists
  • Dictionaries (e.g., maps or associations)

First, some imports:

USING: arrays combinators formatting hashtables kernel
math.parser sequences splitting ;

An encoded payload value looks something like “{LENGTH}:{PAYLOAD}{TYPE}”. We can write a simple word to parse a string that looks like that into its parts:

: parse-payload ( data -- remain payload payload-type )
    ":" split1 swap string>number cut unclip swapd ;

We can build a simple parser for these “typed netstrings” (deferring for the moment, how we parse lists and dictionaries):

DEFER: parse-dict
DEFER: parse-list

: parse-tnetstring ( data -- remain value )
    parse-payload {
        { CHAR: # [ string>number ] }
        { CHAR: " [ ] }
        { CHAR: } [ parse-dict ] }
        { CHAR: ] [ parse-list ] }
        [ "Invalid payload type: %c" sprintf throw ]
    } case ;

Parsing lists is just repeatedly parsing values until the remainder is exhausted:

: parse-list ( data -- value )
    [ { } ] [
        [ dup empty? not ] [ parse-tnetstring ] produce nip
    ] if-empty ;

Parsing dictionaries is only a little more involved. We need a way to parse successive key/value pairs, checking some simple error conditions:

: parse-pair ( data -- extra value key )
    parse-tnetstring [
        dup [ "Unbalanced dictionary store" throw ] unless
        parse-tnetstring
        dup [ "Invalid value, null not allowed" throw ] unless
    ] dip ;

Then we can build the dictionary, repeatedly parsing key/value pairs:

: parse-dict ( data -- value )
    [ H{ } ] [
        [ dup empty? not ] [ parse-pair swap 2array ] produce
        nip >hashtable
    ] if-empty ;

And, to make the interface easy to use, we wrap our parse-tnetstring word, checking that there was no un-parsed remainder:

: tnetstring ( data -- value )
    parse-tnetstring swap [
        "Had trailing junk: %s" sprintf throw
    ] unless-empty ;

We can show that it works on one of Zed’s more complex examples:

IN: scratchpad "34:5:hello\"22:11:12345678901#4:this\"]}" tnetstring .
H{ { "hello" { 12345678901 "this" } } }

The code (and some tests) for this is on GitHub.

Update: I added support for booleans and implemented writer words to match the reader words above. Everything is on my GitHub.