Re: Factor

Factor: the language, the theory, and the practice.

tzfile

Thursday, November 28, 2013

#files #parsing #time

I have wanted to parse timezone information files (also known as “tzfile”) for awhile. In particular, so that Factor can begin to support named timezones in a smarter way.

Parsing

The tzfile is a binary format file from the tz database (also known as the “zoneinfo database”). Each tzfile starts with the four magic bytes “TZif”, which we can check:

ERROR: bad-magic ;

: check-magic ( -- )
    4 read "TZif" sequence= [ bad-magic ] unless ;

The tzfile then contains a header followed by a series of ttinfo structures and other information:

STRUCT: tzhead
    { tzh_reserved char[16] }
    { tzh_ttisgmtcnt be32 }
    { tzh_ttisstdcnt be32 }
    { tzh_leapcnt be32 }
    { tzh_timecnt be32 }
    { tzh_typecnt be32 }
    { tzh_charcnt be32 } ;

PACKED-STRUCT: ttinfo
    { tt_gmtoff be32 }
    { tt_isdst uchar }
    { tt_abbrind uchar } ;

We can store all the information parsed from the tzfile in a tuple:

TUPLE: tzfile header transition-times local-times types abbrevs
leaps is-std is-gmt ;

C: <tzfile> tzfile

With a helper word to read 32-bit big-endian numbers, we can parse the entire file:

: read-be32 ( -- n )
    4 read be32 deref ;

: read-tzfile ( -- tzfile )
    check-magic tzhead read-struct dup {
        [ tzh_timecnt>> [ read-be32 ] replicate ]
        [ tzh_timecnt>> [ read1 ] replicate ]
        [ tzh_typecnt>> [ ttinfo read-struct ] replicate ]
        [ tzh_charcnt>> read ]
        [ tzh_leapcnt>> [ read-be32 read-be32 2array ] replicate ]
        [ tzh_ttisstdcnt>> read ]
        [ tzh_ttisgmtcnt>> read ]
    } cleave <tzfile> ;

All of that data specifies a series of local time types and transition times:

TUPLE: local-time gmt-offset dst? abbrev std? gmt? ;

C: <local-time> local-time

TUPLE: transition seconds timestamp local-time ;

C: <transition> transition

The abbreviated local time names are stored in a flattened array. It would be helpful to parse them out into a hashtable where the key is the starting character index in the flattened array:

:: tznames ( abbrevs -- assoc )
    0 [
        0 over abbrevs index-from dup
    ] [
        [ dupd abbrevs subseq >string 2array ] keep 1 + swap
    ] produce 2nip >hashtable ;

We can now construct an array of all the transition times and the local time types they represent. This is a lot of logic for a typical Factor word, so we use local variables to make it easier to understand:

:: tzfile>transitions ( tzfile -- transitions )
    tzfile abbrevs>> tznames :> abbrevs
    tzfile is-std>> :> is-std
    tzfile is-gmt>> :> is-gmt
    tzfile types>> [
        [
            {
                [ tt_gmtoff>> seconds ]
                [ tt_isdst>> 1 = ]
                [ tt_abbrind>> abbrevs at ]
            } cleave
        ] dip
        [ is-std ?nth dup [ 1 = ] when ]
        [ is-gmt ?nth dup [ 1 = ] when ] bi <local-time>
    ] map-index :> local-times
    tzfile transition-times>>
    tzfile local-times>> [
        [ dup unix-time>timestamp ] [ local-times nth ] bi*
        <transition>
    ] 2map ;

We want to wrap the tzfile parsed structure and the transitions in a tzinfo object that can be used later with timestamps. These tzinfo objects are created by parsing from specific files by path or by their zoneinfo name:

TUPLE: tzinfo tzfile transitions ;

C: <tzinfo> tzinfo

: file>tzinfo ( path -- tzinfo )
    binary [
        read-tzfile dup tzfile>transitions <tzinfo>
    ] with-file-reader ;

: load-tzinfo ( name -- tzinfo )
    "/usr/share/zoneinfo/" prepend file>tzinfo ;

Timestamps

Now that we have the tzinfo, we can convert a UTC timestamp into the timezone specified by our tzfile. This is accomplished by finding the transition time that affects the requested timestamp and adjusting by the GMT offset that it represents:

: find-transition ( timestamp tzinfo -- transition )
    [ timestamp>unix-time ] [ transitions>> ] bi*
    [ [ seconds>> before? ] with find drop ]
    [ swap [ 1 [-] swap nth ] [ last ] if* ] bi ;

: from-utc ( timestamp tzinfo -- timestamp' )
    [ drop instant >>gmt-offset ]
    [ find-transition local-time>> gmt-offset>> ] 2bi
    convert-timezone ;

Or normalize a timestamp that might be in a different timezone into the timezone specified by our tzfile (converting into and then out of UTC):

: normalize ( timestamp tzinfo -- timestamp' )
    [ instant convert-timezone ] [ from-utc ] bi* ;

Example

An example of it working, taking a date in PST that is after a daylight savings transition, printing it out then subtracting 10 minutes and normalizing to the “US/Pacific” zoneinfo file, printing it out showing the time in PDT:

IN: scratchpad ! Take a time in PST
               2002 10 27 1 0 0 -8 hours <timestamp>

               ! Print it out
               dup "%c" strftime .
"Sun Oct 27 01:00:00 2002"

IN: scratchpad ! Subtract 10 minutes
               10 minutes time-

               ! Normalize to US-Pacific timezone
               "US/Pacific" load-tzinfo normalize

               ! Print it out
               "%c" strftime .
"Sun Oct 27 01:50:00 2002"

The code for this is available in the development version of Factor.