Text-to-Speech

Saturday, October 19, 2013

To those of us from a certain decade of computing, the phrase “text-to-speech” reminds us favorably of Dr. Sbaitso. A fun take on that reminiscence is an article titled “Dr. Sbaitso was my only friend”.

It would be neat if we could have access to text-to-speech functionality from Factor. And it would be especially neat if it was cross-platform!

We’ll start by defining a speak-text word that is a generic word that dispatches on the value of the os object, so we can provide platform-specific implementations:

HOOK: speak-text os ( str -- )

Mac OS

On Mac OS, we cheat a bit and just call out to the say command-line tool built into Mac OS:

M: macosx speak-text
    "say \"%s\"" sprintf try-process ;

We just use the default voice set in System Preferences, but changing the voice is just one of the many options available including adjusting the number of words spoken per minute. For more information on Mac OS support for speech, read the Speech Synthesis Programming Guide.

Linux

On Linux, text-to-speech is not builtin. Instead, I decided to use the Festival Speech Synthesis System, which includes a command-line tool that can be configured to speak text:

M: linux speak-text
    "festival --tts" utf8 [ print ] with-process-writer ;

In addition to this, you can find a whole host of other features in the Festival manual.

Windows

On Windows, it would probably be cool to bind to the Microsoft Speech API, but that seemed a little bit harder than the quick-and-dirty approach I took.

Support required two commits to the main Factor repository by Doug Coleman and myself:

google.translate: adding translate-tts - using Google Translate to “speak” text to an MP3
windows.winmm: Add binding to play mp3s - using WinMM to play MP3 files

Those two commits allow us to implement speak-text on Windows:

M: windows speak-text
    translate-tts open-command play-command close-command ;

This code for this is available on my GitHub.