-
1Step 1
- If you want to use the TTS engine with ALSA integration only follow steps 1, 2, 4, 5. For the Newspaper Reader follow all steps.
First install the TTS engine. I suggest you use mfurquim's fork.
git clone git://github.com/mfurquim/picopi.git
As an alternative you can go with the original repository from Doug
git clone git://github.com/DougGore/picopi.git
If you go with theoriginal version you will need to add these files: strdup8to16.c, strdup16to8.c, strdup16to8.cpp, jstring.hwhich you can get from mfurquim's fork.
Followthe first steps of Doug Gore's instructions to setup the librarycd picopi/pico/lib make && sudo make install
It is normal to see a lot of warnings.
Now have the TTS library installed. -
2Step 2
Now let's install ALSA's development environment. This can take some time.
cd /picoip/pico wget http://alsa.cybermirror.org/lib/alsa-lib-1.0.29.tar.bz2 tar -xjvf alsa-lib-1.0.29.tar.bz2 cd alsa-lib-1.0* ./configure && make sudo make install cd .. rm alsa-lib-1.0.29.tar.bz2 ##UPDATE FEBRUARY 2017: ##ALSA 1.1.3 is the current version. Use these commands instead to download and compile cd /picoip/pico wget ftp://ftp.alsa-project.org/pub/lib/alsa-lib-1.1.3.tar.bz2 tar -xjvf alsa-lib-1.1.3.tar.bz2 cd alsa-lib-1.1* ./configure && make
… and test if it works by compiling one of the supplied example programs
sudo make install
cd ..
rm alsa-lib-1.1.3.tar.bz2cd alsa-lib-1.0.29/test make pcm amixer cset numid=3 1 ./pcm
You should hear a 440 Hz sine wave when plugging a headphone into the analog jack. The amixer command above directs sound to the analog jack. If you want it output differently you need to change the amixer options.
At the time of writing alsa-lib-1.0.29 is the latest version. UPDATE FEB 2017: ALSA 1.1.3
-
3Step 3
Now we will add some of the components we need for Python. Again, this may take some time.
sudo apt-get install espeak python-espeak python-dev python-pip python-alsaaudio sudo pip install feedparser beautifulsoup4
Why do we need eSpeak if we are going to run the Android TTS engine? I have gone back and forth on this point but decided that eSpeak is good enough for reading out the names of the papers. It gives you an extra voice that makes it easier for the user to determine, just based on voice, where in the menu hierarchy (s)he is.
In addition, if you plan to use the MPL3155 module you need to install
sudo apt-get install python-smbus
-
4Step 4
Now we need to add a file that is missing from the tts repositories. Fortunately Google themselves make it available but it needs some modifications to make it work with the Raspi port. You can find the file at https://android.googlesource.com/platform/external/svox/+/9b08cc440f25c4722ca112642be87053fc47f918/pico/tts/com_svox_picottsengine.cpp
cd picopi/pico/tts nano com_svox_picottsengine.cpp
Paste the file in the editor. Before you can save it a few changes need to be made within the file.---------- more ----------
- Add the following define statements at the beginning of the file
#if 0
#define ALOGE printf
#define ALOGI printf
#define ALOGV printf
#else
#define ALOGE
#define ALOGI
#define ALOGV
#endif - Find and replace all occurrences of LOGE by ALOGE, LOGV by ALOGV and LOGI by ALOGI
- Comment out or delete the include statements for log.h and AndroidRuntime.h
- Add these include statements
#include "jstring.h"
#include "svox_ssml_parser.h"
#include <string.h>
#include <stdint.h>
#include <stdlib.h>
Make sure these are included before the include for TtsEngine.h - Change #include <tts/TtsEngine.h> to #include <TtsEngine.h>
- After using namespace Android;
add using namespace std; - Change the file path for PICO_LINGWARE_PATH to
const char* PICO_LINGWARE_PATH = "../lang/"; - After
const char * PICO_PHONEME_OPEN_TAG = "<phoneme ph='";
add
const char * PICO_PHONEME_CLOSE_TAG = "'/>"; - Find the function tts_result TtsEngine::init( synthDoneCB_t synthDoneCBPtr )
In the original file that you pulled from the web this would have been in line 522
Add one variable tts_result TtsEngine::init( synthDoneCB_t synthDoneCBPtr , const char *config) - Find TtsEngine::setAudioFormat(AudioSystem::audio_format& encoding, uint32
Replace with
TtsEngine::setAudioFormat(tts_audio_format& encoding, uint32
and leave the rest of the line as is - Find the line
ret = pico_resetEngine(pico_Engine);
Replace with (or add the extra argument)
ret = pico_resetEngine(pico_Engine, PICO_RESET_SOFT); - Find the three occurrences of AudioSystem::PCM_16_BIT and replace each with TTS_AUDIO_FORMAT_16_BIT
- Remove the entire function TtsEngine::synthesizeIpa( ... ) { ... }
After the end of the synthesizeText function you should have the TtsEngine::stop() function. - Add these functions
/** createPhonemeString * Wrap all individual words in <phoneme> tags. * The Pico <phoneme> tag only supports one word in each tag, * therefore they must be individually wrapped! * @xsampa - text to convert to Pico phomene string * @length - length of the input string * return new string with tags applied */ extern char * createPhonemeString( const char * xsampa, int length ) { char * convstring = NULL; int origStrLen = strlen(xsampa); int numWords = 1; int start, totalLength, i, j; for (i = 0; i < origStrLen; i ++) { if ((xsampa[i] == ' ') || (xsampa[i] == '#')) { numWords ++; } } if (numWords == 1) { convstring = new char[origStrLen + 17]; convstring[0] = '\0'; strcat(convstring, PICO_PHONEME_OPEN_TAG); strcat(convstring, xsampa); strcat(convstring, PICO_PHONEME_CLOSE_TAG); } else { char * words[numWords]; start = 0; totalLength = 0; i = 0; j = 0; for (i=0, j=0; i < origStrLen; i++) { if ((xsampa[i] == ' ') || (xsampa[i] == '#')) { words[j] = new char[i+1-start+17]; words[j][0] = '\0'; strcat( words[j], PICO_PHONEME_OPEN_TAG); strncat(words[j], xsampa+start, i-start); strcat( words[j], PICO_PHONEME_CLOSE_TAG); start = i + 1; j++; totalLength += strlen(words[j-1]); } } words[j] = new char[i+1-start+17]; words[j][0] = '\0'; strcat(words[j], PICO_PHONEME_OPEN_TAG); strcat(words[j], xsampa+start); strcat(words[j], PICO_PHONEME_CLOSE_TAG); totalLength += strlen(words[j]); convstring = new char[totalLength + 1]; convstring[0] = '\0'; for (i=0 ; i < numWords ; i++) { strcat(convstring, words[i]); delete [] words[i]; } } return convstring; } /* The XSAMPA uses as many as 5 characters to represent a single IPA code. */ typedef struct tagPhnArr { char16_t strIPA; /* IPA Unicode symbol */ char strXSAMPA[6]; /* SAMPA sequence */ } PArr; #define phn_cnt (134+7) PArr PhnAry[phn_cnt] = { /* XSAMPA conversion table This maps a single IPA symbol to a sequence representing XSAMPA. This relies upon a direct one-to-one correspondance including diphthongs and affricates. */ /* Vowels (23) complete */ {0x025B, "E"}, {0x0251, "A"}, {0x0254, "O"}, {0x00F8, "2"}, {0x0153, "9"}, {0x0276, "&"}, {0x0252, "Q"}, {0x028C, "V"}, {0x0264, "7"}, {0x026F, "M"}, {0x0268, "1"}, {0x0289, "}"}, {0x026A, "I"}, {0x028F, "Y"}, {0x028A, "U"}, {0x0259, "@"}, {0x0275, "8"}, {0x0250, "6"}, {0x00E6, "{"}, {0x025C, "3"}, {0x025A, "@`"}, {0x025E, "3\\\\"}, {0x0258, "@\\\\"}, /* Consonants (60) complete */ {0x0288, "t`"}, {0x0256, "d`"}, {0x025F, "J\\\\"}, {0x0261, "g"}, {0x0262, "G\\\\"}, {0x0294, "?"}, {0x0271, "F"}, {0x0273, "n`"}, {0x0272, "J"}, {0x014B, "N"}, {0x0274, "N\\\\"}, {0x0299, "B\\\\"}, {0x0280, "R\\\\"}, {0x027E, "4"}, {0x027D, "r`"}, {0x0278, "p\\\\"}, {0x03B2, "B"}, {0x03B8, "T"}, {0x00F0, "D"}, {0x0283, "S"}, {0x0292, "Z"}, {0x0282, "s`"}, {0x0290, "z`"}, {0x00E7, "C"}, {0x029D, "j\\\\"}, {0x0263, "G"}, {0x03C7, "X"}, {0x0281, "R"}, {0x0127, "X\\\\"}, {0x0295, "?\\\\"}, {0x0266, "h\\\\"}, {0x026C, "K"}, {0x026E, "K\\\\"}, {0x028B, "P"}, {0x0279, "r\\\\"}, {0x027B, "r\\\\'"}, {0x0270, "M\\\\"}, {0x026D, "l`"}, {0x028E, "L"}, {0x029F, "L\\\\"}, {0x0253, "b_<"}, {0x0257, "d_<"}, {0x0284, "J\\_<"}, {0x0260, "g_<"}, {0x029B, "G\\_<"}, {0x028D, "W"}, {0x0265, "H"}, {0x029C, "H\\\\"}, {0x02A1, ">\\\\"}, {0x02A2, "<\\\\"}, {0x0267, "x\\\\"}, /* hooktop heng */ {0x0298, "O\\\\"}, {0x01C0, "|\\\\"}, {0x01C3, "!\\\\"}, {0x01C2, "=\\"}, {0x01C1, "|\\|\\"}, {0x027A, "l\\\\"}, {0x0255, "s\\\\"}, {0x0291, "z\\\\"}, {0x026B, "l_G"}, /* Diacritics (37) complete */ {0x02BC, "_>"}, {0x0325, "_0"}, {0x030A, "_0"}, {0x032C, "_v"}, {0x02B0, "_h"}, {0x0324, "_t"}, {0x0330, "_k"}, {0x033C, "_N"}, {0x032A, "_d"}, {0x033A, "_a"}, {0x033B, "_m"}, {0x0339, "_O"}, {0x031C, "_c"}, {0x031F, "_+"}, {0x0320, "_-"}, {0x0308, "_\""}, /* centralized */ {0x033D, "_x"}, {0x0318, "_A"}, {0x0319, "_q"}, {0x02DE, "`"}, {0x02B7, "_w"}, {0x02B2, "_j"}, {0x02E0, "_G"}, {0x02E4, "_?\\\\"}, /* pharyngealized */ {0x0303, "~"}, /* nasalized */ {0x207F, "_n"}, {0x02E1, "_l"}, {0x031A, "_}"}, {0x0334, "_e"}, {0x031D, "_r"}, /* raised equivalent to 02D4 */ {0x02D4, "_r"}, /* raised equivalent to 031D */ {0x031E, "_o"}, /* lowered equivalent to 02D5 */ {0x02D5, "_o"}, /* lowered equivalent to 031E */ {0x0329, "="}, /* sylabic */ {0x032F, "_^"}, /* non-sylabic */ {0x0361, "_"}, /* top tie bar */ {0x035C, "_"}, /* Suprasegmental (15) incomplete */ {0x02C8, "\""}, /* primary stress */ {0x02CC, "%"}, /* secondary stress */ {0x02D0, ":"}, /* long */ {0x02D1, ":\\\\"}, /* half-long */ {0x0306, "_X"}, /* extra short */ {0x2016, "||"}, /* major group */ {0x203F, "-\\\\"}, /* bottom tie bar */ {0x2197, "<R>"}, /* global rise */ {0x2198, "<F>"}, /* global fall */ {0x2193, "<D>"}, /* downstep */ {0x2191, "<U>"}, /* upstep */ {0x02E5, "<T>"}, /* extra high level */ {0x02E7, "<M>"}, /* mid level */ {0x02E9, "<B>"}, /* extra low level */ {0x025D, "3`:"}, /* non-IPA %% */ /* Affricates (6) complete */ {0x02A3, "d_z"}, {0x02A4, "d_Z"}, {0x02A5, "d_z\\\\"}, {0x02A6, "t_s"}, {0x02A7, "t_S"}, {0x02A8, "t_s\\\\"} }; void CnvIPAPnt( const char16_t IPnt, char * XPnt ) { char16_t ThisPnt = IPnt; /* local copy of single IPA codepoint */ int idx; /* index into table */ /* Convert an individual IPA codepoint. A single IPA code could map to a string. Search the table. If it is not found, use the same character. Since most codepoints can be contained within 16 bits, they are represented as wide chars. */ XPnt[0] = 0; /* clear the result string */ /* Search the table for the conversion. */ for (idx = 0; idx < phn_cnt; idx ++) { /* for each item in table */ if (IPnt == PhnAry[idx].strIPA) { /* matches IPA code */ strcat( XPnt, (const char *)&(PhnAry[idx].strXSAMPA) ); /* copy the XSAMPA string */ return; } } strcat(XPnt, (const char *)&ThisPnt); /* just copy it */ } /** cnvIpaToXsampa * Convert an IPA character string to an XSAMPA character string. * @ipaString - input IPA string to convert * @outXsampaString - converted XSAMPA string is passed back in this parameter * return size of the new string */ int cnvIpaToXsampa( const char16_t * ipaString, size_t ipaStringSize, char ** outXsampaString ) { size_t xsize; /* size of result */ size_t ipidx; /* index into IPA string */ char * XPnt; /* short XSAMPA char sequence */ /* Convert an IPA string to an XSAMPA string and store the xsampa string in *outXsampaString. It is the responsibility of the caller to free the allocated string. Increment through the string. For each base & combination convert it to the XSAMP equivalent. Because of the XSAMPA limitations, not all IPA characters will be covered. */ XPnt = (char *) malloc(6); xsize = (4 * ipaStringSize) + 8; /* assume more than double size */ *outXsampaString = (char *) malloc( xsize );/* allocate return string */ *outXsampaString[0] = 0; xsize = 0; /* clear final */ for (ipidx = 0; ipidx < ipaStringSize; ipidx ++) { /* for each IPA code */ CnvIPAPnt( ipaString[ipidx], XPnt ); /* get converted character */ strcat((char *)*outXsampaString, XPnt ); /* concatenate XSAMPA */ } free(XPnt); xsize = strlen(*outXsampaString); /* get the final length */ return xsize; }
Save and exit.
- Add the following define statements at the beginning of the file
-
5Step 5
Next delete main.cpp and the Makefile in the picopi/pico/tts directory that you should still be in. They will be replaced by my version from git.
Build the program in the tts directory.
rm Makefile rm main.cpp git clone git://github.com/kirchnet/Newspaper-Reader.git make
Ignore the warnings. You should see a new executable tts.alsa in the tts subdirectory. Try it out:
amixer cset numid=3 1 ./tts.alsa
If you plug a headphone into the audio jack you should hear "test, test, test." To output the sound otherwise change the amixer command.The file language.txt defines the language in which the contents of the file temp.txt are synthesized. Try other languages. Allowable values are eng-USA, eng-GBR, deu-DEU, spa-ESP, fra-FRA or ita-ITA. As a bonus I added an excerpt from a book I published last year that illustrates the use of the <pitch> tag in the text. This is meant to be sythesized in deu-DEU.
If all you want to do is use the TTS engine you are done now. If you want to build the Newspaper Reader continue with the next step.
-
6Step 6
more to come soon ...
-
7Step 7
Install WiringPi. Works on both Raspi and BananaPi. <not needed? need to double check>
-
8Step 8
On the hardware side we need to add three buttons, and if you choose to use it we also need to add <TODO>
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.