Re: Factor

Factor: the language, the theory, and the practice.

Data Formats

Thursday, July 31, 2025

#data

A data format is a standardized way of encoding, storing, and representing data, allowing different software applications to interpret and process it. I was reminded of this recently when a link was shared to From XML to JSON to CBOR which discusses three pivotal data formats and their evolution.

Some data formats that Factor supports in its extensive standard library:

  • Bencode or BitTorrent encoding
  • BSON or Binary JSON
  • CBOR or Concise Binary Object Representation
  • CSV or Comma-separated values
  • JSON or JavaScript Object Notation
  • MessagePackIt’s like JSON, but fast and small
  • TOML or Tom’s Obvious Minimal Language
  • TXON or Text Object Notation
  • XML or Extensible Markup Language
  • YAML or Yet Another Markup Language

Most of these are general purpose and can encode most basic object types, including nested structures. With some exceptions – for example csv doesn’t support nesting, txon supports only string keys and values, toml currently only support parsing, and xml requires some manual object-to-XML conversion.

In any event, here is an example showing data that round-trips through six different data formats:

IN: scratchpad LH{
                   { "name" "Factor" }
                   { "age" 22 }
                   { "list" { 4 8 15 16 23 42 } }
                   { "map" { LH{ { "one" 1 } { "two" 2 } } } }
               } [
                   >json json>
                   >msgpack msgpack>
                   >cbor cbor>
                   >bson bson>
                   >bencode bencode>
                   >yaml yaml>
               ] keep = .
t

There are two more data formats that might not be obvious, but also round-trip:

And there are probably many more useful ones we could add to the standard library. For example, Zig has a new data format I’d love to support someday called Zon or Zig Object Notation.

PR’s welcome!