Data Formats

Thursday, July 31, 2025

A data format is a standardized way of encoding, storing, and representing data, allowing different software applications to interpret and process it. I was reminded of this recently when a link was shared to From XML to JSON to CBOR which discusses three pivotal data formats and their evolution.

Some data formats that Factor supports in its extensive standard library:

Bencode or BitTorrent encoding
BSON or Binary JSON
CBOR or Concise Binary Object Representation
CSV or Comma-separated values
JSON or JavaScript Object Notation
MessagePack – It’s like JSON, but fast and small
TOML or Tom’s Obvious Minimal Language
TXON or Text Object Notation
XML or Extensible Markup Language
YAML or Yet Another Markup Language

Most of these are general purpose and can encode most basic object types, including nested structures. With some exceptions – for example csv doesn’t support nesting, txon supports only string keys and values, and xml requires some manual object-to-XML conversion.

In any event, here is an example showing data that round-trips through seven different data formats:

IN: scratchpad LH{
                   { "name" "Factor" }
                   { "age" 22 }
                   { "list" { 4 8 15 16 23 42 } }
                   { "map" { LH{ { "one" 1 } { "two" 2 } } } }
               } [
                   >json json>
                   >msgpack msgpack>
                   >toml toml>
                   >cbor cbor>
                   >bson bson>
                   >bencode bencode>
                   >yaml yaml>
               ] keep = .
t

There are two more data formats that might not be obvious, but also round-trip:

The serialize vocabulary: object>bytes bytes>object
The prettyprint vocabulary: [ unparse ] without-limits eval( -- obj )

And there are probably many more useful ones we could add to the standard library. For example, Zig has a new data format I’d love to support someday called Zon or Zig Object Notation.

PR’s welcome!