The first prototype uses a BananaPi with three navigation buttons in a repurposed Amazon cardboard box. The Raspi B+ was running near max CPU capacity with the TTS engine so I switched to th eBPi. However, the Raspi 2 that has since been released should have enough power to handle this.
A headset turns out to produce the best audio output. Initially a speaker was installed in the box but the sound quality of the speaker combined with the sometimes hard to understand TTS was too inconvenient for the test user. An inexpensive 10 Euro Phillips headset from Media Markt produces the clearest sound. A rotary encoder is used to adjust the volume.
When turned on the box first speaks out temperature and air pressure from a MPL3155 sensor - a feature the user found useful. It then speaks out a list of newspapers. eSpeak is used for this as it has a different voice than the TTS engine. The user presses a button when the name of a paper is read that (s)he wants to have read. The RSS headlines are then retrieved and read out one by one. If the user wants to know more about a headline (s)he presses the button again and is read the summary. After another button press the full article is retrieved.
This is where it gets challenging because we are now dealing with non-standardized web pages. BeautifulSoup in Python can strip all of the scripting and html tags, but there is still a lot of nonsense that gets read out. Therefore, for each featured paper some relevant html tags have been identified that encapsulate the relevant article text. This works reasonably well although there are some papers where the text that is read contains duplicate headlines, duplicate publication dates, duplicate author names or picture subtitles. Also the navigation footer turns out to be hard to remove in some cases.
The TTS engine connects directly to ALSA. This is where this project differs from other implementations of the Android TTS port to the Raspi, which either generate a wave file that is then played by ALSA, or they pipe into ALSA. Generating a wave file takes too long when the text to be spoken is long and leads to a frustrating user experience. Piping is faster than waiting but is unwieldy, to put it mildly, in Python and leads to heavy system loads. Therefore, recompiling the TTS application to interface directly to the ALSA API turned out to be the best option. Credits to the folks at ALSA for making it easy to interface to other C++ applications.
Overall, this product works reasonably well considering the comparatively simple setup. It has been used by one elderly blind person since late 2014.
Very cool project! :) I'm also working on a device for people with visual impairments, can I ask about obtaining more information on the ported Android TTS engine, please? How to install, get started etc. I'm using eSpeak but Android's TTS sounds more natural, would really love to give it a try :)