Aiming for a "Robot that Talks with People"
One of my goals was to create a "robot that talks with people" through Stack-chan.
Stack-chan is already cute without doing anything, but I have been working on development with the dream of a future where Stack-chan can provide advice to users, give praise and encouragement, and engage in playful communication with other Stack-chans.
I had thought that it would take another 1-2 years to fully integrate dialogue management, but with the arrival of ChatGPT, I feel like the future has come in a giant leap!
So, to achieve a "robot that talks with people," I've implemented and tested the ChatGPT integration feature!
When users talk to Stack-chan, Stack-chan responds with a cute voice.
When users ask Stack-chan to introduce itself, it replies, "I am a robot called Stack-chan." In other words, ChatGPT is acting as Stack-chan. As those who use the web version of ChatGPT may know, by starting the conversation with the setting "You are a super cute robot called Stack-chan," ChatGPT will generate responses adhering to this setting.
Demo Mechanism
data:image/s3,"s3://crabby-images/c3efa/c3efa8517222b34c4e7d0d89af6294c76b04d59e" alt=""
The mechanism is as follows. Heavy processes such as speech recognition and synthesis are performed on an external PC.
- The PC recognizes the user's voice (using the VOSK speech recognition library).
- The PC sends the recognized text to Stack-chan.
- Stack-chan sends the user's message to the ChatGPT API and receives a reply from the AI.
- In addition to the authentication API key, the array chat messages is sent.
- Chat messages have a role (role) and content (content). The role indicates user, AI, or system, and the content represents the actual message content.
- Using system roles, you can provide ChatGPT with character settings and instructions for responses.
- The AI's reply is converted to audio data (using the VOICEVOX speech synthesis engine).
- The audio data is played back.
You can try this demo on the Moddable version of the firmware I'm developing. Since all the basic modules are implemented in JavaScript, it can be used by those unfamiliar with Arduino (C++) or web engineers.
In the future, I plan to improve the ease of getting started and usability, such as allowing users to write apps using a web browser without setting up an environment.
Impressions of Using ChatGPT
First and foremost, I was amazed by the naturalness of the responses! I truly felt like I was having a proper conversation with a robot for the first time.
The API is also easy to use and can be quickly integrated into various systems.
On the other hand, ChatGPT is ultimately an AI designed for "text-based chat," and there are challenges when integrating it fully into a communication robot for "voice-based conversation."
- Response accuracy: Sometimes the given character setting is not effective, or the response may be stiff (AI's limitation showing?). The demo used the GPT-3.5 model, but the next-generation GPT-4 has already been released, and we can expect improved accuracy.
- Response speed: There is a lag of a few seconds before a reply is given. It is necessary to devise ways to let users know that the robot is processing the request, such as changing Stack-chan's facial expression.
- Dialogue management: Determining whether the robot should respond when the user pauses in their speech or whether they should listen more can lead to more natural conversations beyond the "one question, one answer" interactions common with smart speakers.
Application Ideas
Various modifications can be made from this configuration.
- Have Stack-chan speak not only in response to user's speech but also based on specific times (e.g., "It's 12 o'clock. Let's take a lunch break!"), changes in sensor values (e.g., "CO2 levels are increasing, should we open a window?"), or triggers from other web APIs.
What would you do with Stack-chan? - Increase the number of Stack-chans. In addition to user and Stack-chan interactions, have conversations between Stack-chans.
- Embed special commands in ChatGPT's responses to perform actions beyond speech. Connect with external APIs and have Stack-chan control devices like TVs or PCs.
What would you do with Stack-chan?
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.