-
Automating DECtalk
03/24/2017 at 23:36 • 0 commentsThe next step was to automate 80speak:
- User submits text through POST request
- Text is received by Python Flask API, and passed to say.exe for synthesis
- Say.exe outputs to a .WAV file, which is converted by PyDub to mp3.
- Link to .mp3 is returned by API
- Page is updated with inline mp3 player, which plays back the speech automatically
- Speech is made available for download
Before I continue, the Python/Flask API is located on port 5000 of 80speak.com, and the endpoint http://80speak.com:5000/send_text accepts a POST request containing the following JSON:
{"message":"Your text to speak!"}
Returned is a link to an MP3 containing your synthesized speech! Feel free to use this to make embedded systems like Raspberry Pi speak!
Cheating Mobile Devices
To make this process happen behind the scenes in a smooth way, I first wanted the user to be able to stay on a single page. This meant using jQuery and some clever HTML5 tricks to return an auto-playing result on the same page you requested it from.
To do this was pretty simple: make a POST request, wait for the MP3 link back, and spawn a hidden auto-playing HTML5 Audio player on the page. However, mobile devices don't allow auto-playing by default, they require the user to interact directly with an Audio object before it can be controlled with code. This is to prevent malicious websites from spawning a hundred auto-playing, silent MP3s off-screen just to eat your data/bandwidth in a small-scale DDOS attack. Or something like that.
I found out that HTML5 audio CAN be auto-played on mobile by JS AFTER the user has already manually started a previous audio object playing. To cheat this system and allow mobile devices to participate in the same way, a split-second silent MP3 is played when the user presses the "SAY IT!" button. That button is designated as the play/pause control for the silent audio. The silent file plays quickly in the background, and the mobile browser now allows us to play the speech audio automatically after it returns! Aha! There's a good Hack of the Day here.
Server Side
The Flask API parses the text POSTed to it by the website, and passes it to SAY.EXE for synthesis. Because SAY.EXE is a Windows binary, it has to be run under WINE. Easy enough.
$ wine say.exe Application tried to create a window, but no driver could be loaded. Make sure that your X server is running and that $DISPLAY is set correctly.
Ah, okay. Needs something for a display to run. Xvfb to the rescue! We create a fake display at 1024x768 for WINE to use.
Xvfb :0 -screen 0 1024x768x16 &
This was added as an "@reboot" to the crontab to make sure the display runs when we start the machine. Now our WINE command looks like this:$ DISPLAY=:0.0 wine say.exe -w WAVE_FILE.wav "Our message goes here!"
It works! WAVE_FILE.wav now contains a recording of DECtalk saying "Our message goes here!". Time to automate with Python's os.system() command:
def convert_to_speech(message): mid = str(uuid.uuid4()).replace("-","") print "----------------------------------------" print "SPEECH CONVERSION\n" print "MID: "+mid print "MESSAGE: "+message print "Converting to speech..." wav_file = "/wav/"+mid+".wav" out_file = "/mp3/"+mid+".mp3" # DOES NOT EXIST YET mp3_file = "/var/www/html"+out_file # DOES NOT EXIST YET try_rm(wav_file) # Deletes if exists try_rm(mp3_file) command = "DISPLAY=:0.0 wine say.exe -w "+wav_file+" "+shellquote(message) print command os.system(command) print "Converting to mp3..." sound = AudioSegment.from_file(wav_file, format="wav") loud = sound+3; loud.export(mp3_file, bitrate='64k', format="mp3") print "DONE!" print "----------------------------------------" return out_file
This function is called by the Flask API "send_text" endpoint, and returns an MP3 version of the speech. This mp3 is spawned in the user's page as a hidden Audio object, and automatically plays the result on both desktop and mobile thanks to the audio button cheat! -
Finding DECtalk
03/24/2017 at 21:55 • 0 commentsOne of the biggest challenges in creating 80speak was sourcing the software to emulate the famous voice of Professor Hawking, and attach it to a web server that the public can use. I possibly could have found original DECtalk hardware and tied that to a Raspberry Pi for control, but this would trade off speed for only a slight authenticity gain. Unlike the original hardware, purely software-based DECtalk instances can be run in parallel, and can produce the speech much faster than it can be spoken. With a hardware solution, it would have to receive a command, capture 1-5000 words of speech in real time, and return the recording to that specific user before processing the next phrases.
Finding DECtalk Software
Finding a DECtalk demo is actually pretty easy. The most commonly distributed version is "SPEAK.EXE", which is a GUI allowing you to have one of ten voices speak any text you write in. (The one we need is the default "Perfect Paul" voice)
However, this won't do. On a Windows machine you could generate macros to control the GUI automatically, but on a headless Linux server you need something a little more command line-based.
Eventually I found a distribution of DECtalk that came with exactly what I needed: "SAY.EXE". This is a command line-only version of DECtalk, which would allow me to automate the speech generation process! If run with no arguments, it reads whatever is typed into STDIN. If provided with quoted text after the exe, it will read that aloud and then quit.
Next up was capturing the audio to a file, but luckily the SAY executable will allow you to write the output to .WAV format directly, so this saved me a step!