Typed Netstrings
Sunday, March 20, 2011
A few hours ago, Zed Shaw tweeted about an experiment creating typed netstrings:
An experiment in tagging netstrings with the types of their data as an alternative to JSON: https://codepad.org/xct0E5ac
I thought that an implementation in Factor would contrast nicely with Zed’s Python version. The basic idea is to support four kinds of data types:
- Text
- Numbers
- Lists
- Dictionaries (e.g., maps or associations)
First, some imports:
USING: arrays combinators formatting hashtables kernel
math.parser sequences splitting ;
An encoded payload value looks something like
“{LENGTH}:{PAYLOAD}{TYPE}
”. We can write a simple word to parse a
string that looks like that into its parts:
: parse-payload ( data -- remain payload payload-type )
":" split1 swap string>number cut unclip swapd ;
We can build a simple parser for these “typed netstrings” (deferring for the moment, how we parse lists and dictionaries):
DEFER: parse-dict
DEFER: parse-list
: parse-tnetstring ( data -- remain value )
parse-payload {
{ CHAR: # [ string>number ] }
{ CHAR: " [ ] }
{ CHAR: } [ parse-dict ] }
{ CHAR: ] [ parse-list ] }
[ "Invalid payload type: %c" sprintf throw ]
} case ;
Parsing lists is just repeatedly parsing values until the remainder is exhausted:
: parse-list ( data -- value )
[ { } ] [
[ dup empty? not ] [ parse-tnetstring ] produce nip
] if-empty ;
Parsing dictionaries is only a little more involved. We need a way to parse successive key/value pairs, checking some simple error conditions:
: parse-pair ( data -- extra value key )
parse-tnetstring [
dup [ "Unbalanced dictionary store" throw ] unless
parse-tnetstring
dup [ "Invalid value, null not allowed" throw ] unless
] dip ;
Then we can build the dictionary, repeatedly parsing key/value pairs:
: parse-dict ( data -- value )
[ H{ } ] [
[ dup empty? not ] [ parse-pair swap 2array ] produce
nip >hashtable
] if-empty ;
And, to make the interface easy to use, we wrap our parse-tnetstring
word, checking that there was no un-parsed remainder:
: tnetstring ( data -- value )
parse-tnetstring swap [
"Had trailing junk: %s" sprintf throw
] unless-empty ;
We can show that it works on one of Zed’s more complex examples:
IN: scratchpad "34:5:hello\"22:11:12345678901#4:this\"]}" tnetstring .
H{ { "hello" { 12345678901 "this" } } }
The code (and some tests) for this is on GitHub.
Update: I added support for booleans and implemented writer words to match the reader words above. Everything is on my GitHub.