# NOOX
## 1. Project Introduction
NOOX is an open-source intelligent hardware platform designed for scenarios involving "external hardware + desktop automation + Large Language Models (LLMs)". It emphasizes plug-and-play convenience and portability, aiming to achieve intelligent control and automated operations of host computers through a single ESP32-S3-based device. NOOX combines hardware control, desktop automation, and AI autonomous planning capabilities, providing users with a powerful intelligent desktop assistant and development tool.
## 2. Features
The NOOX platform offers the following core features:
* **USB Composite Device**: The device connects to the host via a USB Type-C interface, simultaneously emulating USB HID (Keyboard/Mouse) and USB CDC (Virtual Serial Port) for bidirectional communication and control with the host.
* **AI Integration and Autonomous Planning**:
* Supports multiple LLM providers (OpenRouter / DeepSeek / OpenAI-compatible APIs).
* In advanced mode, the AI can perform multi-step planning based on context and command output, automatically invoke tools, and iterate execution until the user's goal is achieved.
* Supports dialogue history management to maintain multi-turn conversation context.
* **Web Control Console**: Built-in HTTP and WebSocket servers provide an intuitive Web UI for device configuration (LLM provider, API key, WiFi settings) and real-time AI conversations.
* **Host Agent Auto-Bootstrapping**: The device uses USB HID to emulate keyboard operations, automatically opening PowerShell on the host and using `Invoke-WebRequest` to download and execute the cross-platform host agent program from the device's built-in web server.
* **Cross-Platform Shell Interaction**: The host agent program communicates with the device via USB CDC using JSON, enabling cross-platform Shell command execution and feedback (supporting powershell/pwsh/cmd/bash/sh, etc.).
* **LLM Tool Capabilities**: Tools callable by the LLM include:
* `run_command`: Execute Shell commands on the host.
* `hid_keyboard_type`: Type text via HID keyboard.
* `hid_keyboard_press`: Simulate key combinations or special keys.
* `hid_keyboard_macro`: Execute a series of keyboard operation macros.
* `gpio_set`: Control GPIO/LED pins on the ESP32-S3 device.
* **Multi-Channel Interaction**: In addition to the Web interface, the device offers local UI interaction via an OLED screen and physical buttons, displaying status information and enabling menu navigation.
* **Dynamic Configuration Management**: Supports runtime modification of LLM provider, API key, and WiFi configurations via the Web interface, with persistent storage.
## 3. Usage
The NOOX platform offers multiple interaction methods, allowing users to flexibly interact with the device and AI.
### 3.1 Web Interface Interaction
Access the Web console by browsing to the device's IP address (viewable on the OLED screen).
1. **Chat Mode**:
* Enter natural language messages in the input box on the main interface and click "Send" or press Enter.
* The AI will respond in natural language and supports Markdown rendering for clear, formatted information.
* Switch to this mode using the "Chat" button in the top bar.
2. **Advanced Mode**:
* Switch to this mode using the "Advanced" button in the top bar.
* In this mode, the AI can autonomously plan and invoke tools (e.g., `run_command`, `hid_keyboard_type`, `gpio_set`) to complete complex tasks.
* Users can observe the AI's tool invocation process and results, and continue inputting instructions.
3. **Settings Management**:
* Click the "Settings" button in the top bar to open the sidebar.
* **LLM Settings**: Configure LLM provider, model, and API Key.
* **WiFi Management**: View, connect, disconnect, or delete saved WiFi networks.
* **Add/Update WiFi**: Manually enter SSID and password to save new WiFi networks.
* All changes must be applied by clicking the "Save and Apply" button to synchronize with the device.
4. **GPIO Control**:
* In the Advanced Mode panel, GPIO 1 and GPIO 2 on the device can be directly controlled via switches.
### 3.2 USB Shell Interaction
Connect the device's OTG-USB to the host. The device will automatically download and launch the agent on the host. If the device is not connected to a local network, it will first attempt to obtain the host's connected local network and connect.
**Note**: The device will only attempt to connect to the host during initial startup. Therefore, connect the host device via OTG-USB Type-C to power it on before the device starts. If you connect the host after the device has initialized, simply press the RST button to restart.
After successful initialization, users can interact directly with the device in the terminal.
1. **Natural Language Dialogue**:
* Enter natural language commands in the terminal where the agent program is running. The device will forward the input to the LLM for processing.
* The AI's responses and tool execution results will be displayed directly in the terminal.
2. **AI Autonomous Planning**:
* Based on user instructions and Shell command output, the LLM will automatically select and execute tools like `run_command` to achieve desktop automation.
* For example, you can ask the AI to "create a README.txt on the desktop and write content to it." The AI will autonomously plan and execute the corresponding Shell commands.
**Tip**: Dialogue information between the Web interface and USB Shell is shared, allowing you to instruct the AI to perform host-side Shell operations across devices via the Web interface.
### 3.3 OLED Screen and Button Interaction
The OLED screen and physical buttons on the device provide a local interaction interface that does not require host connection.
1. **Status Display**:
* The OLED screen displays the current LLM mode, model in use, WiFi SSID, IP address, and DRAM memory usage.
2. **Menu Navigation**:
* Use the buttons to navigate between different menu items, such as entering the WiFi settings menu or viewing system information.
* Within the WiFi settings menu, you can view saved networks and perform connect/disconnect operations.
## 4. Web Interface Features

NOOX's Web interface (`index.html`, `style.css`, `script.js`) provides a modern, responsive user experience, featuring:
* **Top Bar**: Displays the project name "NOOX" and includes mode switching buttons (Chat Mode / Advanced Mode) and a settings button.
* **Chat Area**: The main interface for displaying user input and AI responses. **Supports Markdown rendering**, with AI messages featuring a pulsing animation effect.
* **Input Box**: A text area for user message input and a send button, supporting Enter to send (Shift+Enter for new lines).
* **Advanced Mode Panel**: Displayed in Advanced Mode, containing functional modules such as:
* **Shell Interaction**: A banner for the command-line terminal simulation module.
* **HID Operations**: A banner for the keyboard and mouse control module.
* **GPIO Operations**: An interactive module providing switch control for GPIO 1 and GPIO 2.
* **Toast Notification System**: Displays success, error, or information prompts in the top-right corner of the page.
* **Settings Sidebar**: Slides out from the left, offering detailed configuration options:
* **LLM Settings**: Select LLM provider and model, enter API Key.
* **WiFi Management**: Displays a list of saved WiFi networks, offering connect, disconnect, and delete (forget) functionalities.
* **Add/Update WiFi**: Enter a new SSID and password to save a network.
* **Save and Apply**: Saves all settings to the device and applies them immediately.
The interface adopts a dark theme, elegant and concise.
## 5. Software and Hardware Technical Details
### 5.1 Hardware

* **Main Controller**: ESP32-S3-WROOM-1 module, dual-core 32-bit CPU @ 240 MHz, with built-in 16 MB Flash and 8 MB PSRAM. Supports WiFi (802.11 b/g/n 2.4 GHz) and USB 2.0 OTG.
* **Display**: 0.96-inch 128x64 pixel OLED screen (SSD1315 driver, I2C interface, GPIO4-SDA, GPIO5-SCL).
* **Input**: Three high-level trigger physical buttons (GPIO47-OK, GPIO21-Up, GPIO38-Down), configured as internal pull-down inputs.
* **Output**: Three monochrome LEDs illuminated by high-level signals (GPIO41, GPIO40, GPIO39) and one WS2812 RGB LED (GPIO48).
* **Expansion Interfaces**: Reserved I2C (GPIO1-SDA, GPIO2-SCL), SPI (TF card, GPIO6-CS, GPIO15-SCK, GPIO16-MISO, GPIO7-MOSI), and UART (GPIO17-TX, GPIO18-RX) interfaces, as well as three general-purpose GPIOs (GPIO8, GPIO9, GPIO10).
* **Power**: USB Type-C interface for power (5V) and a polymer lithium battery (4.2V) for power with automatic switching.
#### Circuit Detail Analysis
This is a classic dual power automatic switching circuit. When the device is connected to USB, the PMOS transistor is turned off. Even if current flows through the body diode, because VUSB is higher than Vbat, Vgs of the PMOS is > 0, so the PMOS body diode is off. The load is supplied by VUSB (5V), and the lithium battery is charged simultaneously. When the USB is disconnected, the PMOS turns on, and the load is supplied by VBat.
**Note:** You need to turn on switch Q1 on the device to enable battery power.
### 5.2 Software
* **Development Framework**: Arduino Framework.
* **Integrated Development Environment (IDE)**: PlatformIO (VS Code extension).
* **Real-Time Operating System (RTOS)**: FreeRTOS (built into ESP-IDF).
* **File System**: LittleFS, used for storing configuration files (`config.json`), web static files (`index.html.gz`, `style.css.gz`, `script.js.gz`), and the host agent program.
* **Core Library Dependencies**:
* `U8g2`: OLED display driver.
* `FastLED`: RGB LED control.
* `AsyncTCP` and `ESPAsyncWebServer`: Asynchronous TCP and Web server.
* `WebSockets`: WebSocket communication support.
* `ArduinoJson`: JSON data serialization and deserialization.
* **Host Agent Program**: Developed using Go, compiled into a self-contained native executable file, enabling cross-platform Shell command execution.
### 5.3 Architecture
#### Architecture Diagram

Generated by gpt5.1-codex, for reference only.
If unclear, please refer to GitHub.
* **Overall Architecture**: The system consists of the ESP32-S3 device, host computer (running NOOX Host Agent and Shell), web browser (Web Client), and cloud-based LLM providers.
* **Module Responsibilities**:
* `ConfigManager`: Manages configuration file reading/writing and LittleFS.
* `HardwareManager`: Controls GPIO, OLED, LEDs, and buttons.
* `WiFiManager`: Manages WiFi connection and network status.
* `LLMManager`: Core module responsible for LLM API calls, dialogue history, tool call parsing and execution, and dual-mode switching.
* `UIManager`: Manages OLED display, button input, and menu navigation.
* `HIDManager`: Emulates USB keyboard/mouse to implement shortcuts and macro operations.
* `WebManager`: Provides Web server, WebSocket communication, static file hosting, and configuration updates.
* `UsbShellManager`: Handles USB CDC serial communication, JSON messages, and Shell command forwarding.
* **Task Scheduling**: Utilizes FreeRTOS multi-task scheduling, allocating network tasks (WebTask, LLMTask) to Core 0 and computation tasks (UITask, USBTask) to Core 1 for performance optimization.
* **Memory Management**: Fully leverages the ESP32-S3's 8 MB PSRAM to store LLM responses, dialogue history, and large JSON documents, preventing DRAM fragmentation and memory overflows. Employs explicit memory ownership and unified allocation/deallocation functions to prevent memory leaks.
### 5.4 Communication Protocols
* **WebSocket Protocol**: The Web client and ESP32 device communicate bidirectionally via the `/ws` endpoint, transmitting chat messages, LLM mode settings, dialogue history clearing, GPIO control, etc.
* **USB CDC Protocol**: The host agent program and the ESP32 device communicate via a virtual serial port, with all messages being single-line JSON followed by a newline character. This includes user input, link tests, Shell command execution results, requests to execute Shell commands, and AI responses. WiFi credentials are sent in a special text format "SSID|Password".
Read the source code's `TECHNICAL_SPECIFICATION.md` for details.
#### Agent Download and Startup Sequence Diagram

Generated by gpt5.1-codex, for reference only.
* **LLM API Protocol**: The ESP32 device calls LLM providers such as DeepSeek, OpenRouter, and OpenAI-compatible APIs via HTTPS. Requests and responses are in JSON format, and in advanced mode, they include tool definitions and tool calls.
Read the `TECHNICAL_SPECIFICATION.md` document for details.
#### LLM Interaction Sequence Diagram

Generated by gpt5.1-codex, for reference only.
## 6. Quick Start
1. **Prepare Environment**:
* Install VS Code and the PlatformIO plugin.
* Install Python 3 (for running packaging scripts).
2. **Add Configuration**:
* Replace the placeholder API keys and WiFi information with your own in `config_manager.cpp`.
* Add desired model names (optional).
3. **Pull Dependencies and Compile Firmware**:
* Open the project in VS Code/PlatformIO.
* Build firmware: `pio run`.
4. **Prepare LittleFS Data**:
* Run `python compress_files.py` to generate compressed web files in the `data/` directory, and copy/compress the host agent to `data/agent/`.
* Upload LittleFS file system: `pio run --target uploadfs`.
5. **Burn Firmware and Connect Device**:
* Upload firmware: `pio run --target upload`.
* Connect to the host via Type-C; the system will be recognized as an HID + CDC device.
6. **Configure WiFi**:
* During the first startup, the device will inject a script via HID to guide the host to obtain and fill in WiFi credentials.
* After the initial connection is successful, you can open a browser to access the device's IP address (viewable on the OLED status page) to enter the Web UI and configure WiFi.
7. **Automatic Download and Run Host Agent**:
* The device opens PowerShell via HID and executes `Invoke-WebRequest` to download the agent to a temporary directory from `http://<device_IP>/api/agent/download?platform=windows` and then launches it.
* The agent establishes a JSON channel with the device via USB CDC and begins interaction.
## 7. Security and Risks
The NOOX platform offers powerful automation capabilities. However, in advanced mode, the LLM can initiate `run_command` (arbitrary Shell commands) and HID operations (emulating keyboard/shortcuts). This means that in certain situations, the device **may execute dangerous operations** (such as deleting files, network penetration, data exfiltration, etc.). **Please ensure you run the Agent within a standard user session and avoid running it with administrator/root privileges. Use with caution and at your own risk. It is recommended to evaluate in an isolated, rollback-capable environment before widespread use.**
## 8. Performance and Limitations
* **CDC Output Limit**: stdout/stderr is limited to approximately 20KB per item; exceeding this will be truncated.
* **CDC Reading**: The USB task executes every 3ms, reading a maximum of 512B per read, with an input buffer limit of 64KB.
* **LLM API Calls**: Response times depend on the model and network latency, typically ranging from 2 to 15 seconds.
* **Concurrency**: WebSocket supports a maximum of 4 clients, and the LLM request queue depth is 3.
## This project's software development was assisted by AI.
---
### All project source code, documentation, and diagrams are open-sourced on GitHub and the OSHW Lichee Project Platform.
### https://github.com/hurricane-0/NOOX
### https://www.bilibili.com/video/BV11TCvB2EbS/
hurricane_5