1. Background
The BW21-CBV-Kit launched by Ai-Thinker is an open-source hardware development platform for intelligent sensing applications. Its core features include:
- AI Vision Capabilities:
Supports on-device AI image recognition algorithms, enabling scenarios such as face recognition, gesture detection, and object identification, with HD image capture via a 2MP camera.
- Processing Performance:
Equipped with dual-band Wi-Fi (2.4GHz + 5GHz) and a high-performance RTL8735B processor (500MHz), meeting real-time image processing and wireless transmission requirements.
- Open-source Ecosystem:
Built on the Arduino development environment, providing rich interfaces such as PWM control and sensor drivers, enabling users to develop custom network camera systems.
Compared with traditional closed-source smart cameras, the BW21-CBV-Kit provides unique advantages:
- Freedom from System Restrictions:
Developers can bypass vendor constraints to create local storage or private cloud solutions, avoiding cloud data leakage risks.
- Extended Scenario Integration:
Supports peripherals such as DHT temperature/humidity sensors and ultrasonic ranging modules, allowing smart home security, environmental monitoring, and other multi-functional applications.
- Development Efficiency:
Provides HomeAssistant integration and pre-built AI model calls, significantly lowering the development barrier for home-grade smart cameras.
This device has strong potential applications in smart homes, industrial visual inspection, and new retail behavior analysis. Its open-source nature is particularly suitable for maker communities and SMEs seeking customized visual solutions.
Many manufacturers currently offer smart cameras, but they are closed-source and tied to specific software and cloud services. The XiaoAnPai camera can be used to build a network camera with a custom software system for home use.
2. Objectives
The web interface should be as user-friendly as possible, with the ability to select video files by date.
The core goal of this project is to develop an intelligent home security monitoring system based on the Xiao An Pai hardware platform, focusing on three key functional modules:
1. 24/7 audio and video acquisition system
- Enables 24/7 uninterrupted video recording, using H.264 encoding technology to ensure 1080P HD quality;
- Integrates facial recognition algorithms for intelligent monitoring of suspicious targets;
2. Tiered storage architecture
- Builds a centralized storage mechanism: NAS subscribes to RTSP audio and video streams and writes them to disk.
3. Cloud-based interactive system
- Builds a streaming media service platform based on a B/S architecture, supporting m3u8 video playback.
Develops a video retrieval module for daily surveillance video retrieval.
The overall system design adheres to a collaborative "end-edge-cloud" architecture, ultimately creating a complete home security solution while ensuring data security.
3. Design Approach
3.1 Hardware Design
Development Board: BW21-CBV-Kit connected to a GC2053 camera.
Camera Enclosure: DIY casing made using the packaging box and manual assembly.
3.2 Software Design
3.2.1 Microcontroller
On the microcontroller side, RTSP video streams are output while marking faces in the video.
#include "WiFi.h"
#include "StreamIO.h"
#include "VideoStream.h"
#include "RTSP.h"
#include "NNFaceDetection.h"
#include "VideoStreamOverlay.h"
#define CHANNEL 0
#define CHANNELNN 3
// Lower resolution for NN processing
#define NNWIDTH 576
#define NNHEIGHT 320
VideoSetting config(VIDEO_FHD, 30, VIDEO_H264, 0);
VideoSetting configNN(NNWIDTH, NNHEIGHT, 10, VIDEO_RGB, 0);
NNFaceDetection facedet;
RTSP rtsp;
StreamIO videoStreamer(1, 1);
StreamIO videoStreamerNN(1, 1);
char ssid[] = "Network_SSID"; // your network SSID (name)
char pass[] = "Password"; // your network password
int status = WL_IDLE_STATUS;
IPAddress ip;
int rtsp_portnum;
void setup()
{
Serial.begin(115200);
// attempt to connect to Wifi network:
while (status != WL_CONNECTED) {
Serial.print("Attempting to connect to WPA SSID: ");
Serial.println(ssid);
status = WiFi.begin(ssid, pass);
// wait 2 seconds for connection:
delay(2000);
}
ip = WiFi.localIP();
// Configure camera video channels with video format information
// Adjust the bitrate based on your WiFi network quality
config.setBitrate(2 * 1024 * 1024); // Recommend to use 2Mbps for RTSP streaming to prevent network congestion
Camera.configVideoChannel(CHANNEL, config);
Camera.configVideoChannel(CHANNELNN, configNN);
Camera.videoInit();
// Configure RTSP with corresponding video format information
rtsp.configVideo(config);
rtsp.begin();
rtsp_portnum = rtsp.getPort();
// Configure face detection with corresponding video format information
// Select Neural Network(NN) task and models
facedet.configVideo(configNN);
facedet.setResultCallback(FDPostProcess);
facedet.modelSelect(FACE_DETECTION, NA_MODEL, DEFAULT_SCRFD, NA_MODEL);
facedet.begin();
// Configure StreamIO object to stream data from video channel to RTSP
videoStreamer.registerInput(Camera.getStream(CHANNEL));
videoStreamer.registerOutput(rtsp);
if (videoStreamer.begin() != 0) {
Serial.println("StreamIO link start failed");
}
// Start data stream from video channel
Camera.channelBegin(CHANNEL);
// Configure StreamIO object to stream data from RGB video channel to face detection
videoStreamerNN.registerInput(Camera.getStream(CHANNELNN));
videoStreamerNN.setStackSize();
videoStreamerNN.setTaskPriority();
videoStreamerNN.registerOutput(facedet);
if (videoStreamerNN.begin() != 0) {
Serial.println("StreamIO link start failed");
}
// Start video channel for NN
Camera.channelBegin(CHANNELNN);
// Start OSD drawing on RTSP video channel
OSD.configVideo(CHANNEL, config);
OSD.begin();
}
void loop()
{
// Do nothing
}
// User callback function for post processing of face detection results
void FDPostProcess(std::vector<FaceDetectionResult> results)
{
int count = 0;
uint16_t im_h = config.height();
uint16_t im_w = config.width();
Serial.print("Network URL for RTSP Streaming: ");
Serial.print("rtsp://");
Serial.print(ip);
Serial.print(":");
Serial.println(rtsp_portnum);
Serial.println(" ");
printf("Total number of faces detected = %d\r\n", facedet.getResultCount());
OSD.createBitmap(CHANNEL);
if (facedet.getResultCount() > 0) {
for (int i = 0; i < facedet.getResultCount(); i++) {
FaceDetectionResult item = results[i];
// Result coordinates are floats ranging from 0.00 to 1.00
// Multiply with RTSP resolution to get coordinates in pixels
int xmin = (int)(item.xMin() * im_w);
int xmax = (int)(item.xMax() * im_w);
int ymin = (int)(item.yMin() * im_h);
int ymax = (int)(item.yMax() * im_h);
// Draw boundary box
printf("Face %ld confidence %d:\t%d %d %d %d\n\r", i, item.score(), xmin, xmax, ymin, ymax);
OSD.drawRect(CHANNEL, xmin, ymin, xmax, ymax, 3, OSD_COLOR_WHITE);
// Print identification text above boundary box
char text_str[40];
snprintf(text_str, sizeof(text_str), "%s %d", item.name(), item.score());
OSD.drawText(CHANNEL, xmin, ymin - OSD.getTextHeight(CHANNEL), text_str, OSD_COLOR_CYAN);
// Draw facial feature points
for (int j = 0; j < 5; j++) {
int x = (int)(item.xFeature(j) * im_w);
int y = (int)(item.yFeature(j) * im_h);
OSD.drawPoint(CHANNEL, x, y, 8, OSD_COLOR_RED);
count++;
if (count == MAX_FACE_DET) {
goto OSDUpdate;
}
}
}
}
OSDUpdate:
OSD.update(CHANNEL);
}3.2.2 Centralized Storage
Use ffmpeg to subscribe to RTSP streams from the XiaoAnPai board and save them locally in segments:
ffmpeg -rtsp_transport tcp -i rtsp://192.168.123.6:554/mystream \
-c copy \
-f hls \
-strftime 1 \
-hls_time 60 \
-hls_list_size 0 \
-hls_flags delete_segments+append_list \
-hls_segment_filename "./data/%Y%m%d/%Y%m%d_%H%M%S.ts" \
"./data/20250323/playlist.m3u8" \
-protocol_whitelist file,rtsp,tcp \
-rw_timeout 5000000Key functionalities:
- RTSP to HLS Conversion: Converts RTSP live streams (e.g., IP cameras) to HLS format for web playback.
- Segmented Storage: Generates .ts segments (e.g., every 3 seconds) stored in ./data/YYYYMMDD/.
- Playlist Management: playlist.m3u8 lists all segments for on-demand playback.
- Automatic Cleanup: Optionally delete old segments to prevent disk overflow.
- Time-based Directory Structure: Dynamically generate directories and filenames based on timestamp.
./data/20250318/20250323_120000.ts ./data/20250318/20250323_120003.ts
3.2.3 Video Playback
Currently, we've only completed video recording and storage. We still need a corresponding front-end page to display the video, and data transmission requires some back-end software support.
3.2.3.1 Backend
The video streaming service backend, built on the Flask framework, provides structured HLS streaming services. Through automated directory scanning and dynamic routing, it manages and distributes video resources categorized by date, primarily serving scenarios requiring time-based retrieval of video content.
1. Data Discovery Layer
- Directory Scanning: Dynamically explores the ../ffmpeg/data directory structure using the get_available_dates() function.
- Validation: Verifies both directory naming compliance (YYYYMMDD format) and playlist integrity (presence of playlist.m3u8).
- Timeline Sorting: Sorts directories with valid dates in ascending chronological order to ensure accurate timing in the front-end timeline display.
- Visualization Portal: The root route provides an index.html template to support front-end interface integration.
- Structured Data Interface: /api/dates The endpoint returns a standardized date list (YYYY-MM-DD format), achieving front-end and back-end data separation.
- Intelligent format conversion: Automatically converts stored date identifiers into a user-friendly display format.
- Protocol Adaptation Routing: /video/ implements HLS protocol streaming.
- M3U8 Index Distribution: Directly returns a playlist file for the specified date.
- TS Segment Processing: Use the serve_ts_files method to accurately locate and securely verify segment resources.
- Four-fold security verification: Includes file name length verification, date format verification, directory existence check, and file physical presence verification.
2. Interface Service Layer
- Streaming Service Layer
from flask import Flask, render_template, jsonify, send_from_directory
import os
from datetime import datetime
app = Flask(__name__)
DATA_DIR = '../ffmpeg/data'
def get_available_dates():
dates = []
for dirname in os.listdir(DATA_DIR):
dir_path = os.path.join(DATA_DIR, dirname)
m3u8_path = os.path.join(dir_path, 'playlist.m3u8')
if os.path.isdir(dir_path) and os.path.exists(m3u8_path):
try:
datetime.strptime(dirname, '%Y%m%d')
dates.append(dirname)
except ValueError:
continue
dates.sort(key=lambda x: datetime.strptime(x, '%Y%m%d'))
return dates
@app.route('/')
def index():
return render_template('index.html')
@app.route('/api/dates')
def api_dates():
dates = get_available_dates()
formatted_dates = [f"{d[:4]}-{d[4:6]}-{d[6:8]}" for d in dates]
return jsonify({'dates': formatted_dates})
@app.route('/video/<filename>')
def serve_m3u8(filename):
if ".ts" in filename:
return serve_ts_files(filename)
else:
dir_path = os.path.join(DATA_DIR, filename)
return send_from_directory(dir_path, 'playlist.m3u8')
def serve_ts_files(filename):
# Extract the date portion (first 8 digits) from a file name
if len(filename) < 8:
return "Invalid filename", 400
date_str = filename[:8]
try:
# Validate date format
datetime.strptime(date_str, '%Y%m%d')
except ValueError:
return "Invalid date format in filename", 400
# Constructing file paths
dir_path = os.path.join(DATA_DIR, date_str)
print("dir_path is ", dir_path)
if not os.path.isdir(dir_path):
return "Date directory not found", 404
file_path = os.path.join(dir_path, filename)
print("file_path is ", file_path)
if not os.path.exists(file_path):
return "File not found", 404
return send_from_directory(dir_path, filename)
if __name__ == '__main__':
app.run(debug=True)
3.2.3.2 Front-end
1. The front-end primarily handles human-computer interaction, using the HLS streaming protocol to enable cross-platform video playback. A visual user interface is built for date selection, playback control, and status feedback, forming a complete streaming solution in conjunction with the back-end services.
Presentation Layer Design
- Responsive Layout:
- Visual Hierarchy:
- Status Visualization:
- Dual-Mode Compatibility:
- Intelligent Instance Management:
- Event-Driven Loading:
- Timed Operation Flow:
- Playback Status Linkage:
- Intelligent Date Conversion:
Desktop adaptation achieved through container maximum width constraints and automatic margins
The control bar uses Flex layout to maintain element spacing, and the video player features a full-width black background to enhance visual focus
Dynamically displays playback progress and total duration for enhanced feedback
2. Streaming Adaptation Layer
The Hls.js library is preferred for advanced feature support
A native HTML5 player is used as an alternative to ensure usability in environments like Safari
Dynamically destroys and recreates HLS objects to avoid multi-stream memory leaks
Loading order is ensured through the MEDIA_ATTACHED and MANIFEST_PARSED event chains
3. Control Logic Layer
Select a date → Generate a standardized request → Initialize the playback engine → Automatically play
User actions (play/pause) are synchronized with the video element status in real time
Automatically converts display format dates to storage format parameters
<!-- templates/index.html -->
<!DOCTYPE html>
<html>
<head>
<title>Video surveillance system</title>
<style>
.container {
max-width: 800px;
margin: 20px auto;
padding: 20px;
}
.controls {
margin-bottom: 20px;
display: flex;
gap: 10px;
align-items: center;
}
#videoPlayer {
width: 100%;
background: #000;
}
button {
padding: 5px 15px;
cursor: pointer;
}
</style>
</head>
<body>
<div class="container">
<div class="controls">
<select id="dateSelect">
<option value="">Select Date</option>
<button id="playBtn">Play</button>
<button id="pauseBtn">Pause</button>
<button id="pauseBtn">Pause</button>
<span id="status">Ready</span>
</div>
<video id="videoPlayer" controls></video>
</div>
<script>
const video = document.getElementById('videoPlayer');
let hls = null;
let currentDate = '';
// Initialize HLS support detection
if (Hls.isSupported()) {
hls = new Hls();
hls.attachMedia(video);
}
// Loading available dates
fetch('/api/dates')
.then(res => res.json())
.then(data => {
const select = document.getElementById('dateSelect');
data.dates.forEach(date => {
const option = document.createElement('option');
option.value = date;
option.textContent = date;
select.appendChild(option);
});
});
// Date Selection Event
document.getElementById('dateSelect').addEventListener('change', function() {
const selectedDate = this.value;
if (!selectedDate) return;
currentDate = selectedDate.replace(/-/g, '');
const m3u8Url = `/video/${currentDate}`;
if (Hls.isSupported()) {
if (hls) {
hls.destroy();
}
hls = new Hls();
hls.attachMedia(video);
hls.on(Hls.Events.MEDIA_ATTACHED, () => {
hls.loadSource(m3u8Url);
hls.on(Hls.Events.MANIFEST_PARSED, () => {
video.play();
});
});
} else if (video.canPlayType('application/vnd.apple.mpegurl')) {
video.src = m3u8Url;
video.addEventListener('loadedmetadata', () => {
video.play();
});
}
});
// Playback Controls
document.getElementById('playBtn').addEventListener('click', () => {
video.play();
});
document.getElementById('pauseBtn').addEventListener('click', () => {
video.pause();
});
// Update status display
video.addEventListener('timeupdate', () => {
const status = document.getElementById('status');
status.textContent = `Playing - ${formatTime(video.currentTime)}/${formatTime(video.duration)}`;
});
function formatTime(seconds) {
const date = new Date(0);
date.setSeconds(seconds);
return date.toISOString().substr(11, 8);
}
</script>
</body>
</html>
Demo video: https://www.bilibili.com/video/BV1zSXYYAEhJ/ 4. Future Prospects
4.1 Event Detection and Quick Navigation
Currently, the frontend can only manually jump through video.
Future improvements: Use face recognition to identify family members and detect strangers, adding event tracks to allow fast navigation in the playback timeline.
4.2 Integration with HomeAssistant
Current implementation is independent with custom frontend/backend.
Embedding the system into HomeAssistant (HASS) would broaden its application and accessibility.
Ai-Thinker