Magic
Sunday, February 12, 2023
Ever wonder what the type of a particular binary file is? Or wonder how a program knows that a particular binary file is in a compatible file format? One way is to look at the magic number used by the file format in question. You can see some examples in a list of file signatures.
The libmagic library commonly supports the file command on Unix systems, other than Apple macOS which has its own implementation, and uses magic numbers and other techniques to identify file types. You can see how it works through a few examples:
$ file vm/factor.hpp
vm/factor.hpp: C++ source text, ASCII text
$ file Factor.app/Contents/Info.plist
Factor.app/Contents/Info.plist: XML document text
$ file factor
factor: Mach-O 64-bit executable x86_64
$ file factor.image
factor.image: data
Wrapping the C library
I am going to show how to wrap a C library using the alien
vocabulary which
provides an FFI
capability in
Factor. The man pages for
libmagic
show us some of the functions available in magic.h
.
The libmagic
library needs to be made available to the Factor instance:
"magic" {
{ [ os macosx? ] [ "libmagic.dylib" ] }
{ [ os unix? ] [ "libmagic.so" ] }
} cond cdecl add-library
We start by defining an opaque type for magic_t
:
TYPEDEF: void* magic_t
Some functions are available for opening, loading, and then closing the
magic_t
:
FUNCTION: magic_t magic_open ( int flags )
FUNCTION: int magic_load ( magic_t magic, c-string path )
FUNCTION: void magic_close ( magic_t magic )
It is convenient to wrap the close function as a destructor for use in a with-destructors form.
DESTRUCTOR: magic_close
A function that “returns a textual description of the contents of the filename argument”, which gives us the file command ability above:
FUNCTION: c-string magic_file ( magic_t magic, c-string path )
That should be everything we need to continue…
Using the C library
Now that we have the raw C library made available as Factor words, we can create a simpler interface by wrapping some of the words into a simple word that guesses the type of a file:
: guess-file ( path -- result )
[
normalize-path
0 magic_open &magic_close
[ f magic_load drop ]
[ swap magic_file ] bi
] with-destructors ;
And we can then try it on a few files:
IN: scratchpad "vm/factor.hpp" guess-file .
"C++ source, ASCII text"
IN: scratchpad "Factor.app/Contents/Info.plist" guess-file .
"XML 1.0 document, Unicode text, UTF-8 text"
IN: scratchpad "factor" guess-file .
"symbolic link to Factor.app/Contents/MacOS/factor"
IN: scratchpad "factor.image" guess-file .
"data"
This has been available for awhile in the magic vocabulary with improved error checking and some options to guess the MIME type of files.