From the Latin, "Docta Vox" meaning "learned voice" or something like that according to Google Translate. It sounded cool and I'm bad at naming things anyway. So there you have it.
This project uses Processing with STT (speech to text) and TTS (text to speech) to allow verbal communication with the program. The program communicates with an Arduino with RF transmitter via serial. A simple demonstration of the project in action is found here.
"Lamp on." *click* "Lamp has been turned on." *commence feeling powerful*
I started this project a couple of months ago with the simple hopes of
controlling an outlet. Flashing an LED is great, but when you need to
make something more substantial happen, this is the thing to do. I
was almost tempted to get relays like this one. I decided against that because of this excellent tutorial called "Arduino Controlled Relay Box."
After I priced it all out at Lowe's and Sparkfun,
I found that each box would cost around $30. These boxes are large,
don't forget, and require a wired connection to the Arduino. To
control five outlets would cost $150 in components plus low-voltage
wire to run around the house to each outlet. I'm not sure my family is
going to be ok with bundles of wires running down the hall.
Time for a better solution. I scrounged around the vast caverns of Amazon to find what I thought would be the best fit for my plans. Etekcity 5LX remote outlets are just the ticket. They cost a whopping $35 and offer control of 5 outlets and negate the need to run wires. Not bad.
I bought the set of 5 outlets and two remotes, but you can also order
almost any combination by viewing the related products on the page.
My first plan was to hack the remote apart and add transistors to the
buttons. This began one of the larges failures in my hacking career: I
fried both remotes! Ah! I wound up ordering a new remote and trying a
much safer method. This new method was to sniff the RF codes of the
remote and retransmit them using the Arduino. Of course, this meant
another stop at Sparkfun for their beautifully simple RF products: RF Receiver, RF Transmitter. Both of these <$5 components proved to be quite valuable.
Once again digging up some help online, I came across the an exquisite
library made for just such a thing as what I was doing. It's called the
RCSwitch Arduino library.
After downloading and installing the library, open up the advanced
receive sketch and follow the link in the commenting to see the
tutorial. Really, you can operate it without much guidance. The code
is simple, and it literally spits out codes on screen as you press
buttons on the remote. I found that on my remote the decimal value was
the easiest to work with. It will spit out a code that looks
something like this: Decimal: 5592371 (24Bit) Binary: 010101010101010100110011 Tri-State: FFFFFFFF0101 PulseLength: 185 microseconds Protocol: 1
Raw data: 5816,220,544,592,152,224,536,596,156,220,540,588,160,220,540,592,172,204,540,596,160,212,544,592,164,208,544,592,
164,212,548,588,164,208,544,216,544,588,168,208,548,208,544,208,544,208,548,208,548,
For the practical purposes of this project, you only care about the
part that says, "Decimal: 5592371" You should press each button on the
remote in an order that you will remember and then copy all the data
from the serial monitor into something else (Notepad, or Notepad ++ would be great). Save it.
Next, open the transmit sketch and begin testing. I decided to make
the repeat value 9 (I discovered that is what the remote itself sends)
and I changed the decimal code to the one for "ON" on my first outlet.
I uploaded the sketch and watched it work! Success is great! I then
modified the code to do all 10 buttons on my remote. I added the
switch/case statement to shorten it a bit. If you are wondering why I
went with a serial interface, it is so that I can more easily interact
with Processing in the next section of this tutorial. Here is the
transmit code that I use in the final version.
/*
Example for different sending methods
http://code.google.com/p/rc-switch/
Need help? http://forum.ardumote.com
*/
#include <RCSwitch.h>
int message = 0;
RCSwitch mySwitch = RCSwitch();
void setup() {
Serial.begin(9600);
// Transmitter is connected to Arduino Pin #3
mySwitch.enableTransmit(3);
// Optional set pulse length.
mySwitch.setPulseLength(185);
// Optional set protocol (default is 1, will work for most outlets)
// mySwitch.setProtocol(2);
// Optional set number of transmission repetitions.
mySwitch.setRepeatTransmit(9);
}
void loop() {
delay(5);
if(Serial.available() > 0){
message = Serial.read();
switch(message){
case 'q':
mySwitch.send(5592371, 24);
break;
case 'w':
mySwitch.send(5592380, 24);
break;
case 'e':
mySwitch.send(5592515, 24);
break;
case 'r':
mySwitch.send(5592524, 24);
break;
case 't':
mySwitch.send(5592835, 24);
break;
case 'y':
mySwitch.send(5592844, 24);
break;
case 'u':
mySwitch.send(5594371, 24);
break;
case 'i':
mySwitch.send(5594380, 24);
break;
case 'o':
mySwitch.send(5600515, 24);
break;
case 'p':
mySwitch.send(5600524, 24);
break;
}
}
/* See Example: TypeA_WithDIPSwitches */
// mySwitch.switchOn("11111", "00010");
// delay(1000);
// mySwitch.switchOn("11111", "00010");
// delay(1000);
/* Same switch as above, but using binary code */
// mySwitch.send("000000000001010100010001");
// delay(1000);
// mySwitch.send("000000000001010100010100");
// delay(1000);
/* Same switch as above, but tri-state code */
// mySwitch.sendTriState("FFFFFFFF0101");
// delay(5000);
// mySwitch.sendTriState("FFFFFFFF0110");
// delay(1000);
//delay(20000);
}
If
you wish to play with it now, simply open the serial monitor and type
"q" and hit enter. I used the top row of the keyboard "qwertyuiop"
because it has 10 letters and they are easy to keep track of.
Processing will learn to send these characters later.
=====================================================================================
Time to play with Processing. The Processing IDE
is a fantastic way to program your computer. It is flexible,
understands Java, and makes programming unbelievably quick and easy. I
hope you Arduino users out there already have some experience with this
so that the project is less complicated. Most of the code is
explained in the commenting, so you can pick it up from there.
I broke the code into several files (these open as tabs in Processing)

doctavox_complete.zip |
As
is usually the case when combining code from multiple programmers, It
can be bettered with a little work. I added a commands file in .txt
format to hold all known commands and responses. It is a CSV file and
the program splits it at commas. Anyway, this means that you can add
commands and features in less than five minutes each. Pretty flexible!
I also shortened the way to make the thing talk. The original author
required the syntax "GoogleTTS(String,
String);" Where the first string is what is to be said, and the second
string is "en" for English. Since I only use English, I added a bit of
code to assume this. Now the syntax is, "respond(String);" where the
string is what is to be said. Pretty clean!
Ok, let's take a look at the main setup and loop.
//STT solution by Florian Schulz
//http://florianschulz.info/stt/
//TTS solution by "Amnon"
//http://amnonp5.wordpress.com/2011/11/26/text-to-speech/
//Configuration file system added by [trademark]
//
//Excellent library for RC Codes at http://code.google.com/p/rc-switch/
//to activate listening via phone
import oscP5.*;
import netP5.*;
//import serial to talk to Arduino
import processing.serial.*;
//import minim for managing audio
import ddf.minim.spi.*;
import ddf.minim.signals.*;
import ddf.minim.*;
import ddf.minim.analysis.*;
import ddf.minim.ugens.*;
import ddf.minim.effects.*;
//speech to text library
import com.getflourish.stt.*;
//This will hold what was said by the user
String VCResult = "";
//load commands listed in file
String knownVCommands = "";
String loadedCommands[] = {};
// load configuration file
String configuration = "";
String loadedConfiguration[] = {};
String[] config = {};
boolean micOpen = false;
boolean said = false;
Serial port;
STT stt;
Minim minim;
OscP5 oscP5;
NetAddress netLoc;
void setup(){
size(600,400);
frame.setResizable(true);
stt = new STT(this);
stt.enableDebug();
stt.setThreshold(1.0);
stt.setLanguage("en");
VCResult = "System is ready for voice commands.";
minim = new Minim(this);
oscP5 = new OscP5(this, 8000);
//respond("Welcome to dokta vox beta 2 point Oh");
loadedCommands = loadStrings("knownVCommands.txt");
for(int i=0; i<loadedCommands.length; i++){
knownVCommands = knownVCommands + loadedCommands[i];
System.out.println(knownVCommands);
}
loadedConfiguration = loadStrings("config.txt");
for(int i=0; i<loadedConfiguration.length; i++){
configuration = configuration + loadedConfiguration[i];
println(configuration);
}
//TODO: change this to read config file
port = new Serial(this, "COM4", 9600);
}
//^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
void draw(){
}
//Add a way to activate the listening and not listening features via computer keyboard
void keyPressed(){
if(keyCode == CONTROL){
micOpen = true;
stt.begin();
}
// send Arduino the off command if the down arrow is pressed
if(keyCode == DOWN){
port.write('w');
}
// send Arduino the on command if the up arrow is pressed
if(keyCode == UP){
port.write('q');
}
}
// manually activate the mic via control key
void keyReleased(){
if(keyCode == CONTROL){
micOpen = false;
stt.end();
}
}
void transcribe (String utterance, float confidence)
{
println(utterance);
micOpen = false;
VCResult = utterance;
voiceCommands();
}
void stop() {
speak.close();
minim.stop();
super.stop();
}
The code is fairly simple. Pressing the Ctrl key will activate the
microphone. It acts as a PTT (push to talk) button similar to a
walkie-talkie. Pressing the UP or DOWN arrows will switch the first
outlet on and off. I added the arrow thing for a specific personal
application which requires it. You don't have to use it, but it is
handy for testing. You can see that I have
provisioned for a general config file. I don't have anything that uses
it at the moment, but eventually it could be used to add commands, COM
port, etc. The other file that is imported is the list of things to
understand and say.
This next part will handle STT.
import java.io.File;
//store the state of each outlet
boolean one = false;
boolean two = false;
boolean three = false;
boolean four = false;
boolean five = false;
//These come straight from the voice command file
void voiceCommands(){
String[] vc = split(knownVCommands, ',');
// store what was said
String v = VCResult;
/*This is the model for a new command:
else if(v.equals(vc[next even number]) == true){
doStuffHere();
respond(vc[next odd number]);
}*/
if(v.equals(vc[0]) == true){
System.out.println("Success: How sweet it is");
respond(vc[1]);
}
else if(v.equals(vc[2]) == true){
background(0,255,0);
respond(vc[3]);
}
else if(v.equals(vc[4]) == true){
background(255,0,0);
respond(vc[5]);
}
else if(v.equals(vc[6]) == true){
background(0,0,255);
respond(vc[7]);
}
else if(v.equals(vc[8]) == true){
respond(vc[9]);
}
else if(v.equals(vc[10]) == true){
respond(vc[11]);
}
else if(v.equals(vc[12]) == true){
respond(vc[13]);
}
else if(v.equals(vc[14]) == true){
port.write('q');
one = true;
respond(vc[15]);
}
else if(v.equals(vc[16]) == true){
port.write('w');
one = false;
respond(vc[17]);
}
else if(v.equals(vc[18]) == true){
port.write('e');
two = true;
respond(vc[19]);
}
else if(v.equals(vc[20]) == true){
port.write('r');
two = false;
respond(vc[21]);
}
else if(v.equals(vc[22]) == true){
port.write('t');
three = true;
respond(vc[23]);
}
else if(v.equals(vc[24]) == true){
port.write('y');
three = false;
respond(vc[25]);
}
else if(v.equals(vc[26]) == true){
port.write('u');
four = true;
respond(vc[27]);
}
else if(v.equals(vc[28]) == true){
port.write('i');
four = false;
respond(vc[29]);
}
else if(v.equals(vc[30]) == true){
port.write('o');
five = true;
respond(vc[31]);
}
else if(v.equals(vc[32]) == true){
port.write('p');
five = false;
respond(vc[33]);
}
else if(v.equals(vc[34]) == true){
port.write('e');
two = true;
delay(30);
port.write('t');
three = true;
respond(vc[35]);
}
else if(v.equals(vc[36]) == true){
port.write('r');
two = false;
delay(30);
port.write('y');
three = false;
respond(vc[37]);
}
else if(v.equals(vc[38]) == true){
port.write('q');
one = true;
port.write('e');
two = true;
port.write('t');
three = true;
port.write('u');
four = true;
port.write('o');
five = true;
respond(vc[39]);
}
else if(v.equals(vc[40]) == true){
port.write('w');
one = false;
port.write('r');
two = false;
port.write('y');
three = false;
port.write('i');
four = false;
port.write('p');
five = false;
respond(vc[41]);
}
else if(v.equals(vc[52]) == true){
if(one){respond(vc[42]);}else{respond(vc[43]);}
delay(1000);
if(two){respond(vc[44]);}else{respond(vc[45]);}
delay(1000);
if(three){respond(vc[46]);}else{respond(vc[47]);}
delay(1000);
if(four){respond(vc[48]);}else{respond(vc[49]);}
delay(1000);
if(five){respond(vc[50]);}else{respond(vc[51]);}
delay(1000);
}
}
//make responses easier to write in the code.
//respond(string);
//this is an addition by [trademark], and it negates
//one of the original writer's comments on another part
//of the code, but it is marked to avoid confusion.
void respond(String say){
googleTTS(say, "en"); // add the language to the URL and pass all this to the other funciton
File old = sketchFile("lastThingSaid.mp3");
speak = minim.loadFile("lastThingSaid.mp3", 2048);
speak.play();
// don't clutter HDD with .mp3 files
if(old.exists()){old.delete();}
}
This code is a little more to take in, but is just as simple. To
understand the list of actions and commands, use the array position to locate the command in the knownVCommands.txt file.
else if(v.equals(vc[14]) == true){
port.write('q');
one = true;
respond(vc[15]);
}To
add a command (and response if you like), simply add another statement
like you see above and add the words to the text file. A simplified
explanation of the above code is this: "If what was said equals voice
command 14, send the Arduino a 'q' let the rest of the program know I
turned "one" on, and tell the user I did so." The last interesting
thing I did in this code was to break the tradition with the last
couple commands. I added a "status check" command that will tell me
the state of each outlet. That is where the little list of if
statements is necessitated at the end.
Next up: TTS!
AudioPlayer speak;
import java.net.*;
import java.io.*;
void googleTTS(String txt, String language){
String u = "http://translate.google.com/translate_tts?tl=";
u = u + language + "&q=" + txt;
u = u.replace(" ", "%20");
try {
URL url = new URL(u);
try {
URLConnection connection = url.openConnection();
// This user agent spoof is the loophole that lets this work. As you can see Google thinks we are using FIrefox
connection.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible;
MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET
CLR 1.2.30703)");
connection.connect();
InputStream is = connection.getInputStream();
File f = new File("lastThingSaid.mp3");
OutputStream out = new FileOutputStream(f);
byte buf[] = new byte[1024];
int len;
while ((len = is.read(buf)) > 0) {
out.write(buf, 0, len);
}
out.close();
is.close();
println("File created");
} catch (IOException e) {
e.printStackTrace();
}
} catch (MalformedURLException e) {
e.printStackTrace();
}
said = false;
}
//The two comments below are from the original writer of this code. As seen in the STT file,
// this particular program will speak with the "respond(string)" syntax. The system defaults to English.
// To use the above system, use the following format: googleTTS(String, String);
// The first string is the words to say, and the second is the language slot. "en" for most uses.
Now
we are cooking. The whole thing should be fully functional by now.
The last part is completely optional, but was necessary for me to be
able to use the system without being at my computer. I bought a Bluetooth headset by the way, and that improved accuracy of my translation. It also gives me more freedom to move about. This
particular headset is cheap (I pad $9.99 when it was on sale) and
produces some bothersome static at just 15 feet from the computer. If
you have a nice one, use it! If not, you can buy a cheap one like I
did. It's still better than having to carry the computer around your
house.
I also added an iOS interface. I bought TouchOSC
($4.99) on the App Store a year or two ago for a different project.
It usually winds up finding its way into most of my projects. It is a
good system, and it is reasonably easy to use with Processing if you
download the library that is built for it.
When
you download the editor (free) from the website linked above, you need
to open the OSC layout file included in the main download of this
project (above). You will be greeted with something similar to what you
see in the pictures. Hit "sync" and then allow it through your
firewall. Follow the documentation of the app to download the layout to
your iDevice. Note all this works with Android too. Teach the app the
IP address of your computer, and make sure the ports are the same.
Once you have this up and running, you can then control each outlet from
your pocket device. Someday I'd like to make the system listen by
pressing the button on the side of the Bluetooth headset, but these are
basically impossible to interface with. This is a quick and sensible
solution. Why use voice control if you have to press a button anyway?
Well, for people like you, I made the other iOS page with ON/OFF
switches. I'm also assuming that there aren't any people still reading
who don't appreciate the voice activation aspect :)
// /*
// if you wish to use something other than OSC, you can
// delete this tab, or uncomment the first and last lines
//Add a way to make the computer start listening from a remote location
int [] button = new int [51];
void oscEvent(OscMessage theOscMessage){
//println("Got a message");
String addr = theOscMessage.addrPattern();
//println("addr " + addr);
if(addr.indexOf("/2/push1") != -1){
int i = int((addr.charAt(7))) - 0x30;
button[i] = int(theOscMessage.get(0).floatValue());
}
if(addr.indexOf("/2/push2") != -1){
int i = int((addr.charAt(7))) - 0x30;
button[i] = int(theOscMessage.get(0).floatValue());
}
if(addr.indexOf("/2/push3") != -1){
int i = int((addr.charAt(7))) - 0x30;
button[i] = int(theOscMessage.get(0).floatValue());
}
if(addr.indexOf("/2/push4") != -1){
int i = int((addr.charAt(7))) - 0x30;
button[i] = int(theOscMessage.get(0).floatValue());
}
if(addr.indexOf("/2/push5") != -1){
int i = int((addr.charAt(7))) - 0x30;
button[i] = int(theOscMessage.get(0).floatValue());
}
if(addr.indexOf("/2/push6") != -1){
int i = int((addr.charAt(7))) - 0x30;
button[i] = int(theOscMessage.get(0).floatValue());
}
if(addr.indexOf("/2/push7") != -1){
int i = int((addr.charAt(7))) - 0x30;
button[i] = int(theOscMessage.get(0).floatValue());
}
if(addr.indexOf("/2/push8") != -1){
int i = int((addr.charAt(7))) - 0x30;
button[i] = int(theOscMessage.get(0).floatValue());
}
if(addr.indexOf("/2/push9") != -1){
int i = int((addr.charAt(7))) - 0x30;
button[i] = int(theOscMessage.get(0).floatValue());
}
if(addr.indexOf("/2/push10") != -1){
int i = 10;
button[i] = int(theOscMessage.get(0).floatValue());
}
if(addr.indexOf("/1/push1") != -1){
int i = 11;
button[i] = int(theOscMessage.get(0).floatValue());
}
//===============================================================
// the main PTT button
if(button[11] == 0 && micOpen == true){
micOpen = false;
stt.end();
}
else if(button[11] == 1 && micOpen == false){
micOpen = true;
stt.begin();
}
if(button[1] == 1){
port.write('q');
println("Desk lamp on");
one = true;
}
if(button[2] == 1){
port.write('e');
two = true;
}
if(button[3] == 1){
port.write('t');
three = true;
}
if(button[4] == 1){
port.write('u');
four = true;
}
if(button[5] == 1){
port.write('o');
five = true;
}
if(button[6] == 1){
port.write('w');
one = false;
}
if(button[7] == 1){
port.write('r');
five = true;
}
if(button[8] == 1){
port.write('y');
one = false;
}
if(button[9] == 1){
port.write('i');
five = true;
}
if(button[10] == 1){
port.write('p');
one = false;
}
}
// */
Well, by this time you should be showing you family and friends your great new project! Thanks for reading!
--trademark