-
1Setup off-line speech recognition
Now there are fewer and fewer open source speech recognition modules, and the test results are not ideal. The latest one that can be found is vosk, but vosk does not support raspberry pie 0. Finally, I chose Speechrecogonition + pocketsphinx as the solution for offline speech recognition. In order to ensure the recognition success rate, after several rounds of tests, I chose "my good friend" as the wake-up word, and made a special recognition file. You can also choose your favorite wake-up words. The only suggestion is to consider the recognition success rate.
现在开源语音识别模块越来越少,而且经过测试效果并不理想。能够找到的最新的是VOSK,但是VOSK不支持树莓派0。最终我选择 Speechrecogonition + pocketsphinx作为离线语音识别的解决方案。为了确保识别成功率,经过多轮测试,我选择“My Good Friend”作为唤醒词,并且制作了专门的识别文件。你也可以选择你喜欢的唤醒词,唯一的建议就是考虑一下识别成功率。
-
2Regiest Baidu AI
In the field of cloud computing, I choose Baidu AI Open Platform as the basis. Baidu AI provides many aspects of cloud service support, such as voice, image, text, map, translation, etc. Baidu AI not only supports Chinese, but also other languages such as English. So you can use Baidu AI directly if you like.
在云计算领域,我选择百度AI开放平台作为基础。百度AI提供了很多方面的云服务支持,比如,语音、图像、文字、地图、翻译等。百度AI不仅支持中文,也支持英文等其它语言。所以如果你愿意,也可以直接使用百度AI。
Before using Baidu AI, you need to register users, create applications, and then add the AI services you need to your applications. Record the APPID, API_KEY, and SUCRETY_KEY of your applications and use them later.
使用百度AI前,需要先注册用户,创建应用然后将你需要的AI服务加入到你的应用中。记录好应用的APPID、API_KEY和SECURTY_KEY,在后面将会用到它们。
Baidu AI can be invoked using HTTP or directly using SDK provided by Baidu.
百度AI可以使用HTTP的方式调用,也可以直接使用百度提供的SDK进行调用。
For testing purposes only, most of Baidu AI APIs offer free trials from half a year to one year.
如果仅仅为了测试,百度AI大多数API提供了半年到一年的免费试用。
-
3Record voice
There are two ways to record a sound: one is to trigger the recording by means of buttons, and the other is to record it fully automatically. Because I chose the full voice control mode, I chose the latter.
录音有两种方式:一种是通过按钮等方式去触发录音;另一种方式是全自动录音。因为我选择的是全语音控制模式,所以我选择的是后一种方式。
The key to automatic recording is when to start recording and when to end recording. The strategy used is to set a threshold based on the volume of the sound read through pyaudio to determine when to start and end.
自动录音的关键是何时开始录音和何时结束录音。使用的策略是根据通过pyaudio读取到的声音的音量来设置阈值,用来确定何时开始和结束。
-
4ASR and TTS
I use the voice module of Baidu AI to realize ASR and TTS. Here, I use HTTP to access Baidu AI. The code is more complex, but it is also a display of the calling method of general cloud services.
我使用百度AI的语音模块实现ASR和TTS。在此,我使用的是HTTP的方式访问百度AI,代码较为复杂,但这也是一般云服务的调用方式的一种展示。
Baidu AI can specify the specific language through parameter setting.
百度AI可以通过参数设置,具体的语言。
-
5A workflow engine base on databus
Basic design:
Each function is driven by a process profile
Each process can have multiple steps
Each step can consist of input (multiple) / service / output, but it is not necessary(Just like a page, it only have one button. It also can use to build a simple UI.)
Each step carries out data exchange through the data bus
The steps are scheduled by the worklow engine according to the process rules基本设计:
- 每个功能由一个流程配置文件驱动
- 每个流程可以有多个步骤
- 每个步骤都可以由输入(多项)/服务/输出组成,但都非必须
- 每个步骤都通过数据总线进行数据交换
- 流程步骤之间由规则引擎根据流程规则进行调度
[ { "name": "PlantDetect", "type": "extend", "command": "植物。", "steps": [ { "step": { "name": "inputPicture", "action": { "name": "InputPicture" } }, "actions": { "default": "next" } }, { "step": { "name": "plantDetect", "action": { "name": "PlantDetect" }, "output": "message" }, "actions": { "default": "exit" } } ] }, { "name": "AnimalDetect", "type": "extend", "command": "动物。", "steps": [ { "step": { "name": "inputPicture", "action": { "name": "InputPicture" } }, "actions": { "default": "next" } }, { "step": { "name": "animalDetect", "action": { "name": "AnimalDetect" }, "output": "message" }, "actions": { "default": "exit" } } ] } ]
-
6Build more action and function
The basic operating environment has been built. Now you can add the desired functions according to your needs. At present, some functions of Baidu image recognition, speech recognition, character recognition and map have been integrated in the code.
基础的运行环境已经搭建完成了。现在可以根据自己的需要添加自己期望的功能了。目前代码中已经集成了百度图像识别、语音识别、文字识别以及地图的部分功能。
-
7Last interesting things
In most cases, we don't need hard coding. We can realize new functions by combining existing services. In this case, I designed a function to install new functions using QR code. You only need to scan the QR code of the function configuration to install the new function.
在大多数情况下,我们是不需要硬编码的,只要通过组合现有的服务就可以实现新的功能。在这种情况下,我设计了一个使用二维码安装新功能的功能。你只需要扫描功能配置的二维码就可以安装新的功能了。
{"name":"ReadQRCode","type":"extend","command":"说明。","steps":[{"step":{"name":"inputPicture","action":{"name":"InputPicture"}},"actions":{"default":"next"}},{"step":{"name":"QRCodeDetect","action":{"name":"QRCodeDetect"},"output": "message"},"actions":{"default":"exit"}}]}
This demo is use to read text by scan a QR code.
-
8Maybe more
If you like my project and put forward useful suggestions, maybe I will further promote it.
如果大家喜欢我的项目,并提出有益的建议,也许我会进一步的推进它。
In fact, I have more "advanced" designs to look forward to.
实际上,我还有更“高级”的设计,值得期待。
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.