Re: Factor

Factor: the language, the theory, and the practice.

Dotenv

Tuesday, June 17, 2025

#parsing

Dotenv is an informal file specification, a collection of implementations in different languages, and an organization providing cloud-hosting services. They describe the .env file format and some extensions:

The .env file format is central to good DSX and has been since it was introduced by Heroku in 2012 and popularized by the dotenv node module (and other libraries) in 2013.

The .env file format starts where the developer starts - in development. It is added to each project but NOT committed to source control. This gives the developer a single secure place to store sensitive application secrets.

Can you believe that prior to introducing the .env file, almost all developers stored their secrets as hardcoded strings in source control. That was only 10 years ago!

Besides official and many unofficial .env parsers available in a lot of languages, the Dotenv organization provides support for dotenv-vault cloud services in Node.js, Python, Ruby, Go, PHP, and Rust.

Today, I wanted to show how you might implement a .env parser in Factor.

File Format

The .env files are relatively simple formats with key-value pairs that are separated by an equal sign. These values can be un-quoted, single-quoted, double-quoted, or backtick-quoted strings:

SIMPLE=xyz123
INTERPOLATED="Multiple\nLines"
NON_INTERPOLATED='raw text without variable interpolation'
MULTILINE = `long text here,
e.g. a private SSH key`

Parsing

There are a lot of ways to build a parser – everything from manually spinning through bytes using a hand-coded state machine, higher-level parsing grammars like PEG, or explicit parsing syntax forms like EBNF.

We are going to implement a .env parser using standard PEG parsers, beginning with some parsers that look for whitespace, comment lines, and newlines:

: ws ( -- parser )
    [ " \t" member? ] satisfy repeat0 ;

: comment ( -- parser )
    "#" token [ CHAR: \n = not ] satisfy repeat0 2seq hide ;

: newline ( -- parser )
    "\n" token "\r\n" token 2choice ;

Keys

The .env keys are specified simply:

For the sake of portability (and sanity), environment variable names (keys) must consist solely of letters, digits, and the underscore ( _ ) and must not begin with a digit. In regex-speak, the names must match the following pattern:

[a-zA-Z_]+[a-zA-Z0-9_]*

We can build a key parser by looking for those characters:

: key-parser ( -- parser )
    CHAR: A CHAR: Z range
    CHAR: a CHAR: z range
    [ CHAR: _ = ] satisfy 3choice

    CHAR: A CHAR: Z range
    CHAR: a CHAR: z range
    CHAR: 0 CHAR: 9 range
    [ CHAR: _ = ] satisfy 4choice repeat0

    2seq [ first2 swap prefix "" like ] action ;

Values

The .env values can be un-quoted, single-quoted, double-quoted, or backtick-quoted strings. Only double-quoted strings support escape characters, but single-quoted and backtick-quoted strings support escaping either single-quotes or backtick characters.

: single-quote ( -- parser )
    "\\" token hide [ "\\'" member? ] satisfy 2seq [ first ] action
    [ CHAR: ' = not ] satisfy 2choice repeat0 "'" dup surrounded-by ;

: backtick ( -- parser )
    "\\" token hide [ "\\`" member? ] satisfy 2seq [ first ] action
    [ CHAR: ` = not ] satisfy 2choice repeat0 "`" dup surrounded-by ;

: double-quote ( -- parser )
    "\\" token hide [ "\"\\befnrt" member? ] satisfy 2seq [ first escape ] action
    [ CHAR: " = not ] satisfy 2choice repeat0 "\"" dup surrounded-by ;

: literal ( -- parser )
    [ " \t\r\n" member? not ] satisfy repeat0 ;

Before we implement our value parser, we should note that some values can be interpolated:

Interpolation (also known as variable expansion) is supported in environment files. Interpolation is applied for unquoted and double-quoted values. Both braced (${VAR}) and unbraced ($VAR) expressions are supported.

Direct interpolation

  • ${VAR} -> value of VAR

Default value

  • ${VAR:-default} -> value of VAR if set and non-empty, otherwise default
  • ${VAR-default} -> value of VAR if set, otherwise default

And some values can have command substitution:

Add the output of a command to one of your variables in your .env file. Command substitution is applied for unquoted and double-quoted values.

Direct substitution

  • $(whoami) -> value of $ whoami

We can implement an interpolate parser that acts on strings and replaces observed variables with their interpolated or command-substituted values. This uses a regular expressions and re-replace-with to substitute values appropriately.

: interpolate-value ( string -- string' )
    R/ \$\([^)]+\)|\$\{[^\}:-]+(:?-[^\}]*)?\}|\$[^(^{].+/ [
        "$(" ?head [
            ")" ?tail drop process-contents [ blank? ] trim
        ] [
            "${" ?head [ "}" ?tail drop ] [ "$" ?head drop ] if
            ":-" split1 [
                [ os-env [ empty? not ] keep ] dip ?
            ] [
                "-" split1 [ [ os-env ] dip or ] [ os-env ] if*
            ] if*
        ] if
    ] re-replace-with ;

: interpolate ( parser -- parser )
    [ "" like interpolate-value ] action ;

We can use that to build a value parser, remembering that only un-quoted and double-quoted values are interpolated, and making sure to convert the result to a string:

: value-parser ( -- parser )
    [
        single-quote ,
        double-quote interpolate ,
        backtick ,
        literal interpolate ,
    ] choice* [ "" like ] action ;

Key-Values

Combining those, we can make a key-value parser, that ignores whitespace around the = token and uses set-os-env to update the environment variables:

: key-value-parser ( -- parser )
    [
        key-parser ,
        ws hide ,
        "=" token hide ,
        ws hide ,
        value-parser ,
    ] seq* [ first2 swap set-os-env ignore ] action ;

And finally, we can build a parsing word that looks for these key-value pairs while ignoring optional comments and whitespace:

PEG: parse-dotenv ( string -- ast )
    ws hide key-value-parser optional
    ws hide comment optional hide 4seq
    newline list-of hide ;

Loading Files

We can load a file by reading the file-contents and then parsing it into environment variables:

: load-dotenv-file ( path -- )
    utf8 file-contents parse-dotenv drop ;

These .env files are usually located somewhere above the current directory, typically at a project root. For now, we make a word that traverses from the current directory up to the root, looking for the first .env file that exists:

: find-dotenv-file ( -- path/f )
    f current-directory get absolute-path [
        nip
        [ ".env" append-path dup file-exists? [ drop f ] unless ]
        [ ?parent-directory ] bi over [ f ] [ dup ] if
    ] loop drop ;

And now, finally, we can find and then load the relevant .env file, if there is one:

: load-dotenv ( -- )
    find-dotenv-file [ load-dotenv-file ] when* ;

Some additional features that we might want to follow up on:

This is available in the latest development version. Check it out!