Parsing Configuration Files
Friday, July 16, 2010
The other day I needed a parser for INI-style configuration files. When I couldn’t find a convenient Factor vocabulary to do this, I decided to write one.
A basic configuration file could look like this:
[owner]
name=John Doe
e-mail=john.doe@example.com
[database]
host=127.0.0.1 # change to production when ready
port=1234
username=test
password="a really long string"
These configurations are essentially groups of name/value pairs, and can be naturally expressed as an assoc. We will be implementing a simple API for reading and writing:
: read-ini ( -- assoc )
: write-ini ( assoc -- )
: string>ini ( str -- assoc )
: ini>string ( assoc -- str )
This implementation uses these vocabularies:
USING: arrays assocs combinators formatting hashtables io
io.streams.string kernel make math sequences strings
strings.parser ;
Some utility words are used to trim spaces from tokens, extract strings from section names (e.g., “[database]”), and remove comments from lines:
: unspace ( str -- str' )
[ " \t\n\r" member? ] trim ;
: unwrap ( str -- str' )
1 swap [ length 1 - ] keep subseq ;
: uncomment ( str -- str' )
CHAR: # over index [ head ] when* ;
There are a variety of parsing strategies we could use here. To keep things simple, we will be parsing the configuration file line-by-line. Also, we will make the assumption that each line contains either a “[section]” or a “name=value” (but not both).
We know a line is a section if it starts with ‘[’ and ends with ‘]’:
: section? ( line -- ? )
[ first CHAR: [ = ] [ last CHAR: ] = ] bi and ;
The current section is parsed and stored as a two-element array containing the name of the section and a vector of name/value pairs:
: [section] ( line -- section )
unwrap unspace V{ } clone 2array ;
Each name/value is parsed and added to the vector of name/value pairs in the current section:
: name=value ( section line -- section' )
CHAR: = over index cut rest [ unspace ] bi@
2array over second push ;
We will be using the
make
words. When we encounter a new section, or the end of the file, we will
append the current section to the sequence of sections being built by
make
:
: section, ( section/f -- )
[ first2 >hashtable 2array , ] when* ;
: parse-line ( section line -- section' )
uncomment unspace [
dup section?
[ swap section, [section] ] [ name=value ] if
] unless-empty ;
: read-ini ( -- assoc )
[
f [ parse-line ] each-line section,
] { } make >hashtable ;
Implementing write-ini
is pretty easy. It’s just a matter of iterating
over all values in the specified assoc
, and printing them out with
some minor structure:
: write-ini ( assoc -- )
[
[ "[%s]\n" printf ] dip
[ "%s=%s\n" printf ] assoc-each
nl
] assoc-each ;
The string>ini
and ini>string
words are easy too. Both the
read-ini
and write-ini
words operate on input and output streams
,
so we can use string
streams:
: string>ini ( str -- assoc )
[ read-ini ] with-string-reader ;
: ini>string ( assoc -- str )
[ write-ini ] with-string-writer ;
This was a really simple implementation. In addition to the basics, I wanted to be able to support:
- Embedded escape characters (e.g., “\t”, “\n”, etc.).
- Line continuations (e.g., multi-line values).
- Java .properties files.
- Liberal parsing of minor formatting errors.
- Support both ‘#’ and ‘;’ comment characters.
- Quoted strings (e.g., name=“value”).
You can find all that and more (along with tests and some documentation) on my GitHub. I hope to contribute it to the main repository soon.