The next step was to automate 80speak:
- User submits text through POST request
- Text is received by Python Flask API, and passed to say.exe for synthesis
- Say.exe outputs to a .WAV file, which is converted by PyDub to mp3.
- Link to .mp3 is returned by API
- Page is updated with inline mp3 player, which plays back the speech automatically
- Speech is made available for download
Before I continue, the Python/Flask API is located on port 5000 of 80speak.com, and the endpoint http://80speak.com:5000/send_text accepts a POST request containing the following JSON:
{"message":"Your text to speak!"}
Returned is a link to an MP3 containing your synthesized speech! Feel free to use this to make embedded systems like Raspberry Pi speak!
Cheating Mobile Devices
To make this process happen behind the scenes in a smooth way, I first wanted the user to be able to stay on a single page. This meant using jQuery and some clever HTML5 tricks to return an auto-playing result on the same page you requested it from.
To do this was pretty simple: make a POST request, wait for the MP3 link back, and spawn a hidden auto-playing HTML5 Audio player on the page. However, mobile devices don't allow auto-playing by default, they require the user to interact directly with an Audio object before it can be controlled with code. This is to prevent malicious websites from spawning a hundred auto-playing, silent MP3s off-screen just to eat your data/bandwidth in a small-scale DDOS attack. Or something like that.
I found out that HTML5 audio CAN be auto-played on mobile by JS AFTER the user has already manually started a previous audio object playing. To cheat this system and allow mobile devices to participate in the same way, a split-second silent MP3 is played when the user presses the "SAY IT!" button. That button is designated as the play/pause control for the silent audio. The silent file plays quickly in the background, and the mobile browser now allows us to play the speech audio automatically after it returns! Aha! There's a good Hack of the Day here.
Server Side
The Flask API parses the text POSTed to it by the website, and passes it to SAY.EXE for synthesis. Because SAY.EXE is a Windows binary, it has to be run under WINE. Easy enough.
$ wine say.exe Application tried to create a window, but no driver could be loaded. Make sure that your X server is running and that $DISPLAY is set correctly.
Ah, okay. Needs something for a display to run. Xvfb to the rescue! We create a fake display at 1024x768 for WINE to use.
Xvfb :0 -screen 0 1024x768x16 &This was added as an "@reboot" to the crontab to make sure the display runs when we start the machine. Now our WINE command looks like this:
$ DISPLAY=:0.0 wine say.exe -w WAVE_FILE.wav "Our message goes here!"
It works! WAVE_FILE.wav now contains a recording of DECtalk saying "Our message goes here!". Time to automate with Python's os.system() command:
def convert_to_speech(message):
mid = str(uuid.uuid4()).replace("-","")
print "----------------------------------------"
print "SPEECH CONVERSION\n"
print "MID: "+mid
print "MESSAGE: "+message
print "Converting to speech..."
wav_file = "/wav/"+mid+".wav"
out_file = "/mp3/"+mid+".mp3" # DOES NOT EXIST YET
mp3_file = "/var/www/html"+out_file # DOES NOT EXIST YET
try_rm(wav_file) # Deletes if exists
try_rm(mp3_file)
command = "DISPLAY=:0.0 wine say.exe -w "+wav_file+" "+shellquote(message)
print command
os.system(command)
print "Converting to mp3..."
sound = AudioSegment.from_file(wav_file, format="wav")
loud = sound+3;
loud.export(mp3_file, bitrate='64k', format="mp3")
print "DONE!"
print "----------------------------------------"
return out_file
This function is called by the Flask API "send_text" endpoint, and returns an MP3 version of the speech. This mp3 is spawned in the user's page as a hidden Audio object, and automatically plays the result on both desktop and mobile thanks to the audio button cheat!
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.