Summary
With the 'compact' form of the rules in-hand, is is time to use them.
Deets
I ported the code that processes the rules into C. This was a bit more trouble than I anticipated because the Python version uses some conveniences in that environment -- especially with dynamically sized arrays and string concatenation. Since this code is going to be running in an embedded environment, I wanted to avoid as much copying to temporary and dynamically allocated buffers as much as possible, and rather try to process directly out of any buffers or constant definitions. Additionally, there was a hack in the original rules that required a space to be prepended and appended to the word. This hack allowed using the space as a meta-character for 'Nothing', which was used to indicate that a context pattern needed to be at the very beginning and end of the text. I wound up creating a separate meta-character for that '$' and updated all the rules accordingly. That addition cause me to generate a new distinct string, so I incurred a two-byte penalty to 9385 bytes for the compactified rules.
Incrementally building the code shows these numbers for flash usage:
- 40816 baseline
- 50208 rules included; delta = 9392
- 51908 tts code; delta = 1700
- 51964 simple test code to use TTS to translate a sentence; delta = 56
So this is not too bad; about 2 KB for the actual code, and the simple test (which is fairly representative of how it would be used in practice) is quite small at about 56 bytes.
This means that there is about 12 KB more flash for code growth before the next crisis. I think this might be OK for the remaining stuff I have planned. I've got a little more that 7 KB ram left, and I think this will be enough, too, to finish things up.
The simple test code:
static const char achGettysburg[] =
"four score and seven years ago our fathers brought forth on this continent \
a new nation, conceived in liberty, and dedicated to the proposition that all \
men are created equal.";
const char* pszText = achGettysburg;
int nTextLen = COUNTOF(achGettysburg);
//quicky test running through text
const char* pchWordStart, * pchWordEnd;
int eCvt;
while ( 0 == ( eCvt = pluckWord ( pszText, nTextLen,
&pchWordStart, &pchWordEnd ) ) )
{
int nWordLen = pchWordEnd - pchWordStart;
static uint8_t sl_abyPhon[64]; //semi-arbitrarily sized long word
int nProduced = ttsWord(pchWordStart, nWordLen,
g_abyTTS, sl_abyPhon, COUNTOF(sl_abyPhon) );
//stick on a space between words if there is not already a pause
if ( sl_abyPhon[nProduced-1] > 4 ) //all pauses are code 0 - 4
{
sl_abyPhon[nProduced++] = '\x03';
sl_abyPhon[nProduced++] = '\x02';
}
size_t nIdxPhon = 0;
size_t nRemaining = nProduced;
while ( nRemaining > 0 )
{
size_t nConsumed = SP0256_push ( &sl_abyPhon[nIdxPhon], nRemaining );
nRemaining -= nConsumed;
nIdxPhon += nConsumed;
if ( 0 != nRemaining )
{
osDelay ( 200 ); //sleep a little to let the synth catch up
}
}
//advance
nTextLen -= pchWordEnd - pszText;
pszText = pchWordEnd;
}
So the gist of using it is to crack the text word-by-word (there is a convenience function pluckWord() provided for this), and then for each word 'plucked' from the buffer, push it into ttsWord() to translate it into a phoneme sequence. You can then send this sequence off to the SP0256 task (or whatever).
I added some debug code to make it send the plucked word and text-to-speeched phoneme sequence to the serial for debugging. E.g. for the first sentence of the Gettysburg address:
four 28 35 33 03 02 FF OW ER2 PA4 PA3 score 37 08 35 33 03 02 SS KK3 OW ER2 and 1a 0b 15 03 02 AE NN1 DD1 seven 37 07 23 07 0b 03 02 SS EH VV EH NN1 years 0c 13 33 2b 03 02 IH IY ER2 ZZ ago 1a 3d 35 03 02 AE GG2 OW our 20 33 03 02 AW ER2 fathers 28 1a 36 01 34 2b 03 02 FF AE DH2 PA2 ER2 ZZ brought 1c 27 17 0d 03 02 BB1 RR2 AO TT2 forth 28 17 17 33 1d 03 02 FF AO AO ER2 TH on 17 0b 03 02 AO NN1 this 36 0c 0c 37 37 03 02 DH2 IH IH SS SS continent 08 18 0b 0d 06 0b 07 0b 0d 03 02 KK3 AA NN1 TT2 AY NN1 EH NN1 TT2 a 07 14 03 02 EH EY new 0b 1f 03 02 NN1 UW2 nation, 0b 14 00 25 0e 0b 04 NN1 EY PA1 SH RR1 NN1 PA5 conceived 08 18 0b 37 13 23 07 15 03 02 KK3 AA NN1 SS IY VV DD1 in 0c 0c 0b 03 02 IH IH NN1 liberty, 2d 0c 3f 34 0d 0c 04 LL IH BB2 ER2 TT2 IH PA5 and 1a 0b 15 03 02 AE NN1 DD1 dedicated 21 0c 21 0c 2a 1a 1a 00 0d 0c 15 03 02 DD2 IH DD2 IH KK1 AE AE PA1 TT2 IH DD1 to 0d 1f 03 02 TT2 UW2 the 12 13 03 02 UW2 IY proposition 09 27 0e 0e 09 0e 2b 0c 00 25 0e 0b 03 02 PP RR2 RR1 RR1 PP RR1 ZZ IH PA1 SH RR1 NN1 that 36 1a 0d 03 02 DH2 AE TT2 all 17 2d 03 02 AO LL men 10 07 0b 03 02 MM EH NN1 are 18 34 03 02 AA ER2 created 08 33 13 14 00 0d 0c 15 03 02 KK3 ER2 IY EY PA1 TT2 IH DD1 equal. 13 2a 2e 1a 2d 04 IY KK1 WW AE LL PA5 PA5 PA4
I did go ahead and wire in a command in the monitor for testing this stuff: 'sp' for 'speak'. You're meant to supply a sentence and it will parse and translate much as the code is shown above (with a little extra error checking).
Now I'm curious about simulating the SP0256-AL2 using a PWM output. In this way, you wouldn't need the physical chip to enjoy 1970's era speech synthesis output. This will be a challenge with the flash -- the audio files as-is are something like 144 KiB total -- /that/ won't fit! Also, although the chip (STM32F103C8) is designated and self-reports as having 64 KiB flash, it is an open secret that the device in fact has 128 KiB (same as the 'CB). I will exploit this to get the extra room I need if it all works out.
Next
Chasing another goose named 'SP0256-AL2 simulation'.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.
Ahh thanks !
I am also playing with the chip with an Arduino M0 (sorting out some fake chips)
I would like to try the text-to-phoneme part on the Arduino too
Are you sure? yes | no
it should port over easily as there are no special libs beyond the standard C library (I think there is a strlen() call, and it's not strictly required).
* tts_rules_compact.h, .c are the blob of the TTS rules. As I mentioned in the post, to save space I compacted these into this form. The original rules in human readable form are in the Python PoC.
* text_to_speech.h, .c is what processes the rules, transforming English to phoneme sequences. It also has a 'word cracking' function for breaking up a sentence into words. (The algorithm uses a slightly non-obvious word separation technique with regards to punctuation.)
Also the two methods provided were defined such that they are suitable for directly processing from constant buffers, requiring no mallocs or read/write memory (other than the phoneme buffer which you provide). This was to reduce ram requirements, but also to facilitate streaming in data of indefinite length.
A consequence of this is that there does need to be a 'breaking' character at the end of a 'sentence'. (This could be a LF, which is ignored phonetically.) If you imagine typing text into a terminal, which is then processed by the algorithm, if a sentence 'I wasn't going to the store' happened to be processed at the time 'I was' is received, then the 'I' part would be correctly transformed, but the 'was' part would not be because in truth the word had not been fully received. So that's why it is required that there be some final word breaking character at the end of the complete text -- to avoid spurious word breaking while streaming.
An undocumented feature (which I /think/ works) is that if you provide a 0-length phoneme buffer, the routine will fail and give you a negative result which is the number of phonemes required. I don't use this feature, but it seemed useful when I was writing the code.
Are you sure? yes | no
Yup, in the 'project links' there are two github repos -- one is for the python PoC code, and the other is for the BluePill codebase.
You'll (almost) certainly also need an STLink-V2 programmer if you don't already have one. The uber cheap Chinese ones work fine. (I say 'almost' because there is way to burn the firmware over the serial port, though I've never done this myself.)
Let me know how it goes. I'm expecting to be 'done' with this project in the next couple days, meaning it will drive both the physical SP0256-AL2 (as it does now), but also be able to simulate the chip standalone with PWM. When I'm completely done, I'll put a pre-built firmware in the 'files' section so folks don't have to install the toolchain if they just want to kick the tires.
Are you sure? yes | no
Nice. work ! I just ordered a blue pill to test you code
Is it published somewhere ?
Are you sure? yes | no