Re: Factor

Factor: the language, the theory, and the practice.


Friday, April 26, 2013


While investigating how to determine if a terminal is color capable, I re-discovered terminfo databases. These database files store the capabilities of terminals in a device-independent manner.


The simple answer is to use the tput program to lookup the terminal functionality (using the TERM environment variable):

$ TERM=xterm-256color tput colors

$ TERM=xterm tput colors

$ TERM=vt100 tput colors

If you trace the system calls that tput makes, you will see that it is loading a terminfo file to provide the answer:

stat64("/usr/share/terminfo\0", 0x7FFF5F21A120, 0x7FB279403AD0)
access("/usr/share/terminfo/78/xterm-256color\0", 0x4, 0xE)
open("/usr/share/terminfo/78/xterm-256color\0", 0x0, 0x0)
read(0x3, "\032\001%\0", 0x1001)

I wanted to have access to these capabilities from Factor, without running tput, and chose instead to directly parse the terminfo files.


The compiled terminfo file is created by the tic program, and begins with a header containing six two-byte short integers:

  1. the magic number (octal 0432)
  2. the size, in bytes, of the names section
  3. the number of bytes in the boolean section
  4. the number of short integers in the numbers section
  5. the number of offsets (short integers) in the strings section
  6. the size, in bytes, of the string table

We can parse this header pretty easily using the pack vocabulary:

TUPLE: terminfo-header names-bytes boolean-bytes #numbers
#strings string-bytes ;

C: <terminfo-header> terminfo-header

: read-header ( -- header )
    12 read "ssssss" unpack-le unclip
    0o432 = [ "bad magic" throw ] unless
    5 firstn <terminfo-header> ;

The names section comes next, containing the various names for the terminal separated by a “|” character and terminated by a NUL byte (“0”):

: read-names ( header -- names )
    names-bytes>> read but-last "|" split [ >string ] map ;

The boolean section is stored as one byte per boolean flag, either a 0 or 1:

: read-booleans ( header -- booleans )
    boolean-bytes>> read [ 1 = ] { } map-as ;

The number section is stored as a sequence of two-byte short integers, aligned to an even byte (meaning if the name and boolean sections consume an “odd” number of bytes, an extra byte is inserted that should be skipped over to ensure the numbers start on an even byte):

: read-shorts ( n -- seq' )
    2 * read 2 <groups> [ signed-le> dup 0 < [ drop f ] when ] map ;

: align-even-bytes ( header -- )
    [ names-bytes>> ] [ boolean-bytes>> ] bi + odd?
    [ read1 drop ] when ;

: read-numbers ( header -- numbers )
    [ align-even-bytes ] [ #numbers>> read-shorts ] bi ;

The strings are more complex, stored in two sections. The first section is a sequence of two-byte short integers and the second section is a sequence of bytes. To rebuild the string capabilities, interpret the integers as an offset into the string table:

: read-strings ( header -- strings )
    [ #strings>> read-shorts ] [ string-bytes>> read ] bi '[
        [ _ 0 2over index-from swap subseq >string ] [ f ] if*
    ] map ;

Putting this all together, we can “parse” our terminfo file into an object:

TUPLE: terminfo names booleans numbers strings ;

C: <terminfo> terminfo

: read-terminfo ( -- terminfo )
    read-header {
        [ read-names ]
        [ read-booleans ]
        [ read-numbers ]
        [ read-strings ]
    } cleave <terminfo> ;

Finally, we can write a parsing word to convert a terminfo file into a terminfo object:

: file>terminfo ( path -- terminfo )
    binary [ read-terminfo ] with-file-reader ;


The terminfo files are stored in /usr/share/terminfo. If we wanted to get a list of all available terminfo files, we can just list this directory:

MEMO: terminfo-names ( -- names )
    "/usr/share/terminfo" [
        [ directory-files ] map concat
    ] with-directory-files ;

If instead, we wanted to lookup a specific terminal, we can map the name of the terminal to a directory. On Mac OS, these are stored in a sub-directory with the hexadecimal representation of the first byte in the string. On Linux, the first character is the name of the sub-directory:

HOOK: terminfo-path os ( name -- path )

M: macosx terminfo-path ( name -- path )
    [ first >hex ] keep "/usr/share/terminfo/%s/%s" sprintf ;

M: linux terminfo-path ( name -- path )
    [ first ] keep "/usr/share/terminfo/%c/%s" sprintf ;


With just this much implemented, we can lookup our “max_colors” attribute, knowing it is the 14th number in the numbers table:

: max-colors ( name -- n/f )
    terminfo-path file>terminfo numbers>> 13 swap ?nth ;

IN: scratchpad "xterm-256color" max-colors .

IN: scratchpad "xterm" max-colors .

IN: scratchpad "vt100" max-colors .

I added support for parsing all the capabilities into a hashtable, and allowing named lookup (rather than needing to know the offset like we used above).

This is available now in the terminfo vocabulary.