Data Formats
Thursday, July 31, 2025
A data format is a standardized way of encoding, storing, and representing data, allowing different software applications to interpret and process it. I was reminded of this recently when a link was shared to From XML to JSON to CBOR which discusses three pivotal data formats and their evolution.
Some data formats that Factor supports in its extensive standard library:
- Bencode or BitTorrent encoding
- BSON or Binary JSON
- CBOR or Concise Binary Object Representation
- CSV or Comma-separated values
- JSON or JavaScript Object Notation
- MessagePack – It’s like JSON, but fast and small
- TOML or Tom’s Obvious Minimal Language
- TXON or Text Object Notation
- XML or Extensible Markup Language
- YAML or Yet Another Markup Language
Most of these are general purpose and can encode most basic object types,
including nested structures. With some exceptions – for example csv
doesn’t support nesting, txon
supports only string keys and values,
toml
currently only support parsing, and xml
requires some manual
object-to-XML conversion.
In any event, here is an example showing data that round-trips through six different data formats:
IN: scratchpad LH{
{ "name" "Factor" }
{ "age" 22 }
{ "list" { 4 8 15 16 23 42 } }
{ "map" { LH{ { "one" 1 } { "two" 2 } } } }
} [
>json json>
>msgpack msgpack>
>cbor cbor>
>bson bson>
>bencode bencode>
>yaml yaml>
] keep = .
t
There are two more data formats that might not be obvious, but also round-trip:
- The serialize
vocabulary:
object>bytes bytes>object
- The prettyprint
vocabulary:
[ unparse ] without-limits eval( -- obj )
And there are probably many more useful ones we could add to the standard library. For example, Zig has a new data format I’d love to support someday called Zon or Zig Object Notation.
PR’s welcome!