Text-to-Speech
Saturday, October 19, 2013
To those of us from a certain decade of computing, the phrase “text-to-speech” reminds us favorably of Dr. Sbaitso. A fun take on that reminiscence is an article titled “Dr. Sbaitso was my only friend”.
It would be neat if we could have access to text-to-speech functionality from Factor. And it would be especially neat if it was cross-platform!
We’ll start by defining a speak-text
word that is a generic word that
dispatches on the value of the os
object, so we can provide
platform-specific implementations:
HOOK: speak-text os ( str -- )
Mac OS
On Mac OS, we cheat a bit and just call out to the say command-line tool built into Mac OS:
M: macosx speak-text
"say \"%s\"" sprintf try-process ;
We just use the default voice set in System Preferences, but changing the voice is just one of the many options available including adjusting the number of words spoken per minute. For more information on Mac OS support for speech, read the Speech Synthesis Programming Guide.
Linux
On Linux, text-to-speech is not builtin. Instead, I decided to use the Festival Speech Synthesis System, which includes a command-line tool that can be configured to speak text:
M: linux speak-text
"festival --tts" utf8 [ print ] with-process-writer ;
In addition to this, you can find a whole host of other features in the Festival manual.
Windows
On Windows, it would probably be cool to bind to the Microsoft Speech API, but that seemed a little bit harder than the quick-and-dirty approach I took.
Support required two commits to the main Factor repository by Doug Coleman and myself:
- google.translate: adding translate-tts - using Google Translate to “speak” text to an MP3
- windows.winmm: Add binding to play mp3s - using WinMM to play MP3 files
Those two commits allow us to implement speak-text
on Windows:
M: windows speak-text
translate-tts open-command play-command close-command ;
This code for this is available on my GitHub.