Filename Sanitization
Friday, June 13, 2014
I came across the Zaru project that provides filename sanitization for Ruby. You can learn a bit about filenames reading the article on Wikipedia. I thought it might be a nice feature to implement something like this for Factor.
The rules for sanitization are relatively simple, so I will list and then implement each one:
1. Certain characters generally don’t mix well with certain file systems, so we filter them:
: filter-special ( str -- str' )
[ "/\\?*:|\"<>" member? not ] filter ;
2. ASCII control
characters
(0x00
to 0x1f
) are not usually a good idea, either:
: filter-control ( str -- str' )
[ control? not ] filter ;
3. Unicode whitespace is trimmed from the beginning and end of the filename and collapsed to a single space within the filename:
: filter-blanks ( str -- str' )
[ blank? ] split-when harvest " " join ;
4. Certain filenames are reserved on
Windows
and are filtered (substituting a “file
” placeholder name):
: filter-windows-reserved ( str -- str' )
dup >upper {
"CON" "PRN" "AUX" "NUL" "COM1" "COM2" "COM3" "COM4"
"COM5" "COM6" "COM7" "COM8" "COM9" "LPT1" "LPT2" "LPT3"
"LPT4" "LPT5" "LPT6" "LPT7" "LPT8" "LPT9"
} member? [ drop "file" ] when ;
5. Empty filenames are not allowed, replaced instead with file
:
: filter-empty ( str -- str' )
[ "file" ] when-empty ;
6. Filenames that begin with only a “dot” character are replaced with
file
:
: filter-dots ( str -- str' )
dup first CHAR: . = [ "file" prepend ] when ;
Putting it all together, and requiring the filename to be no more than 255 characters:
: sanitize-path ( path -- path' )
filter-special
filter-control
filter-blanks
filter-windows-reserved
filter-empty
filter-dots
255 short head ;
The code for this (and some tests) is on my GitHub.