Working with CGI: Part 3
Wednesday, February 3, 2010
In Part
2,
we implemented simple parsing of QUERY_STRING
and handling of the
GET
request method.
In getting to the conclusion, I skipped describing an important
convention of HTTP application development. Specifically, that GET
requests should be idempotent. Because of this, as well as privacy
concerns, it is frequently common practice to submit HTML forms with a
POST
request.
According to the POST
convention, the request data is placed in the
message body and usually “URL-encoded” (as described in RFC
2396). This is similar to how
certain characters are “escaped” when included in a URL (for example,
spaces are represented by %20
).
To properly parse these types of POST
requests, we will need to parse
a few other environment variables that are provided to the CGI script.
First, we need a way to parse “content types”. As described in RFC
2616 (the specification for
HTTP/1.1), this represents a mime-type and optional parameters.
For example, a server can specify that the response type will be HTML inside of a UTF-8 character encoding by including the following in the HTTP response headers:
Content-Type: text/html; charset=utf-8
To parse this, we can simple separate the mime-type and parse the parameters:
: (content-type) ( string -- params media/type )
";" split unclip [
[ H{ } clone ] [ first (query-string) ] if-empty
] dip ;
When we submit a form, the values are included in the body and provided to the CGI script using a mime-type of “application/x-www-form-urlencoded”. The contents are provided by parameters encoded in the message body. (Technically, some of the parameters could also be specified in the URL).
We can define a function that will parse the CONTENT_LENGTH
, read the
specified number of bytes from the stream, and then assemble and parse
the URL-encoded query string:
: (urlencoded) ( -- assoc )
"CONTENT_LENGTH" os-env "0" or string>number
read [ "" ] [ "&" append ] if-empty
"QUERY_STRING" os-env [ append ] when* (query-string) ;
These two words are sufficient to parse POST
requests. However, it’s
worth noting that some forms can be submitted with a mime-type of
“multipart/form-data”, which is used for uploading files to servers. We
will put a placeholder word that can remind us to come back to this:
: (multipart) ( -- assoc )
"multipart unsupported" throw ;
Now that we have that, we can write the implement the parsing routine:
: parse-post ( -- assoc )
"CONTENT_TYPE" os-env "" or (content-type) {
{ "multipart/form-data" [ (multipart) ] }
{ "application/x-www-form-urlencoded" [ (urlencoded) ] }
[ drop parse-get ]
} case nip ;
And then extend our <cgi-form>
word to handle POST
requests:
: <cgi-form> ( -- assoc )
"REQUEST_METHOD" os-env "GET" or >upper {
{ "GET" [ parse-get ] }
{ "POST" [ parse-post ] }
[ "Unknown request method" throw ]
} case ;
And simple as that, we can now change our form method from “get” to “post”, and our CGI scripts will continue to work.