1. Background

The BW21-CBV-Kit launched by Ai-Thinker is an open-source hardware development platform for intelligent sensing applications. Its core features include:

AI Vision Capabilities:

Supports on-device AI image recognition algorithms, enabling scenarios such as face recognition, gesture detection, and object identification, with HD image capture via a 2MP camera.

Processing Performance:

Equipped with dual-band Wi-Fi (2.4GHz + 5GHz) and a high-performance RTL8735B processor (500MHz), meeting real-time image processing and wireless transmission requirements.

Open-source Ecosystem:

Built on the Arduino development environment, providing rich interfaces such as PWM control and sensor drivers, enabling users to develop custom network camera systems.

Compared with traditional closed-source smart cameras, the BW21-CBV-Kit provides unique advantages:

Freedom from System Restrictions:

Developers can bypass vendor constraints to create local storage or private cloud solutions, avoiding cloud data leakage risks.

Extended Scenario Integration:

Supports peripherals such as DHT temperature/humidity sensors and ultrasonic ranging modules, allowing smart home security, environmental monitoring, and other multi-functional applications.

Development Efficiency:

Provides HomeAssistant integration and pre-built AI model calls, significantly lowering the development barrier for home-grade smart cameras.

This device has strong potential applications in smart homes, industrial visual inspection, and new retail behavior analysis. Its open-source nature is particularly suitable for maker communities and SMEs seeking customized visual solutions.

Many manufacturers currently offer smart cameras, but they are closed-source and tied to specific software and cloud services. The XiaoAnPai camera can be used to build a network camera with a custom software system for home use.

2. Objectives

The web interface should be as user-friendly as possible, with the ability to select video files by date.

The core goal of this project is to develop an intelligent home security monitoring system based on the Xiao An Pai hardware platform, focusing on three key functional modules:

1. 24/7 audio and video acquisition system

Enables 24/7 uninterrupted video recording, using H.264 encoding technology to ensure 1080P HD quality;
Integrates facial recognition algorithms for intelligent monitoring of suspicious targets;

2. Tiered storage architecture

Builds a centralized storage mechanism: NAS subscribes to RTSP audio and video streams and writes them to disk.

3. Cloud-based interactive system

Builds a streaming media service platform based on a B/S architecture, supporting m3u8 video playback.

Develops a video retrieval module for daily surveillance video retrieval.

The overall system design adheres to a collaborative "end-edge-cloud" architecture, ultimately creating a complete home security solution while ensuring data security.

3. Design Approach

3.1 Hardware Design

Development Board: BW21-CBV-Kit connected to a GC2053 camera.

Camera Enclosure: DIY casing made using the packaging box and manual assembly.

3.2 Software Design

3.2.1 Microcontroller

On the microcontroller side, RTSP video streams are output while marking faces in the video.

#include "WiFi.h"
#include "StreamIO.h"
#include "VideoStream.h"
#include "RTSP.h"
#include "NNFaceDetection.h"
#include "VideoStreamOverlay.h"
#define CHANNEL   0
#define CHANNELNN 3
// Lower resolution for NN processing
#define NNWIDTH  576
#define NNHEIGHT 320
VideoSetting config(VIDEO_FHD, 30, VIDEO_H264, 0);
VideoSetting configNN(NNWIDTH, NNHEIGHT, 10, VIDEO_RGB, 0);
NNFaceDetection facedet;
RTSP rtsp;
StreamIO videoStreamer(1, 1);
StreamIO videoStreamerNN(1, 1);
char ssid[] = "Network_SSID";    // your network SSID (name)
char pass[] = "Password";        // your network password
int status = WL_IDLE_STATUS;
IPAddress ip;
int rtsp_portnum;
void setup()
{
    Serial.begin(115200);
    // attempt to connect to Wifi network:
    while (status != WL_CONNECTED) {
        Serial.print("Attempting to connect to WPA SSID: ");
        Serial.println(ssid);
        status = WiFi.begin(ssid, pass);
        // wait 2 seconds for connection:
        delay(2000);
    }
    ip = WiFi.localIP();
    // Configure camera video channels with video format information
    // Adjust the bitrate based on your WiFi network quality
    config.setBitrate(2 * 1024 * 1024);    // Recommend to use 2Mbps for RTSP streaming to prevent network congestion
    Camera.configVideoChannel(CHANNEL, config);
    Camera.configVideoChannel(CHANNELNN, configNN);
    Camera.videoInit();
    // Configure RTSP with corresponding video format information
    rtsp.configVideo(config);
    rtsp.begin();
    rtsp_portnum = rtsp.getPort();
    // Configure face detection with corresponding video format information
    // Select Neural Network(NN) task and models
    facedet.configVideo(configNN);
    facedet.setResultCallback(FDPostProcess);
    facedet.modelSelect(FACE_DETECTION, NA_MODEL, DEFAULT_SCRFD, NA_MODEL);
    facedet.begin();
    // Configure StreamIO object to stream data from video channel to RTSP
    videoStreamer.registerInput(Camera.getStream(CHANNEL));
    videoStreamer.registerOutput(rtsp);
    if (videoStreamer.begin() != 0) {
        Serial.println("StreamIO link start failed");
    }
    // Start data stream from video channel
    Camera.channelBegin(CHANNEL);
    // Configure StreamIO object to stream data from RGB video channel to face detection
    videoStreamerNN.registerInput(Camera.getStream(CHANNELNN));
    videoStreamerNN.setStackSize();
    videoStreamerNN.setTaskPriority();
    videoStreamerNN.registerOutput(facedet);
    if (videoStreamerNN.begin() != 0) {
        Serial.println("StreamIO link start failed");
    }
    // Start video channel for NN
    Camera.channelBegin(CHANNELNN);
    // Start OSD drawing on RTSP video channel
    OSD.configVideo(CHANNEL, config);
    OSD.begin();
}
void loop()
{
    // Do nothing
}
// User callback function for post processing of face detection results
void FDPostProcess(std::vector<FaceDetectionResult> results)
{
    int count = 0;
    uint16_t im_h = config.height();
    uint16_t im_w = config.width();
    Serial.print("Network URL for RTSP Streaming: ");
    Serial.print("rtsp://");
    Serial.print(ip);
    Serial.print(":");
    Serial.println(rtsp_portnum);
    Serial.println(" ");
    printf("Total number of faces detected = %d\r\n", facedet.getResultCount());
    OSD.createBitmap(CHANNEL);
    if (facedet.getResultCount() > 0) {
        for (int i = 0; i < facedet.getResultCount(); i++) {
            FaceDetectionResult item = results[i];
            // Result coordinates are floats ranging from 0.00 to 1.00
            // Multiply with RTSP resolution to get coordinates in pixels
            int xmin = (int)(item.xMin() * im_w);
            int xmax = (int)(item.xMax() * im_w);
            int ymin = (int)(item.yMin() * im_h);
            int ymax = (int)(item.yMax() * im_h);
            // Draw boundary box
            printf("Face %ld confidence %d:\t%d %d %d %d\n\r", i, item.score(), xmin, xmax, ymin, ymax);
            OSD.drawRect(CHANNEL, xmin, ymin, xmax, ymax, 3, OSD_COLOR_WHITE);
            // Print identification text above boundary box
            char text_str[40];
            snprintf(text_str, sizeof(text_str), "%s %d", item.name(), item.score());
            OSD.drawText(CHANNEL, xmin, ymin - OSD.getTextHeight(CHANNEL), text_str, OSD_COLOR_CYAN);
            // Draw facial feature points
            for (int j = 0; j < 5; j++) {
                int x = (int)(item.xFeature(j) * im_w);
                int y = (int)(item.yFeature(j) * im_h);
                OSD.drawPoint(CHANNEL, x, y, 8, OSD_COLOR_RED);
                count++;
                if (count == MAX_FACE_DET) {
                    goto OSDUpdate;
                }
            }
        }
    }
OSDUpdate:
    OSD.update(CHANNEL);
}

3.2.2 Centralized Storage

Use ffmpeg to subscribe to RTSP streams from the XiaoAnPai board and save them locally in segments:

ffmpeg -rtsp_transport tcp -i rtsp://192.168.123.6:554/mystream \
-c copy \
-f hls \
-strftime 1 \
-hls_time 60 \
-hls_list_size 0 \
-hls_flags delete_segments+append_list \
-hls_segment_filename "./data/%Y%m%d/%Y%m%d_%H%M%S.ts" \
"./data/20250323/playlist.m3u8" \
-protocol_whitelist file,rtsp,tcp \
-rw_timeout 5000000

Key functionalities:

RTSP to HLS Conversion: Converts RTSP live streams (e.g., IP cameras) to HLS format for web playback.
Segmented Storage: Generates .ts segments (e.g., every 3 seconds) stored in ./data/YYYYMMDD/.
Playlist Management: playlist.m3u8 lists all segments for on-demand playback.
Automatic Cleanup: Optionally delete old segments to prevent disk overflow.
Time-based Directory Structure: Dynamically generate directories and filenames based on timestamp.

./data/20250318/20250323_120000.ts

./data/20250318/20250323_120003.ts

3.2.3 Video Playback

Currently, we've only completed video recording and storage. We still need a corresponding front-end page to display the video, and data transmission requires some back-end software support.

3.2.3.1 Backend

The video streaming service backend, built on the Flask framework, provides structured HLS streaming services. Through automated directory scanning and dynamic routing, it manages and distributes video resources categorized by date, primarily serving scenarios requiring time-based retrieval of video content.

1. Data Discovery Layer

Directory Scanning: Dynamically explores the ../ffmpeg/data directory structure using the get_available_dates() function.
Validation: Verifies both directory naming compliance (YYYYMMDD format) and playlist integrity (presence of playlist.m3u8).
Timeline Sorting: Sorts directories with valid dates in ascending chronological order to ensure accurate timing in the front-end timeline display.
Visualization Portal: The root route provides an index.html template to support front-end interface integration.
Structured Data Interface: /api/dates The endpoint returns a standardized date list (YYYY-MM-DD format), achieving front-end and back-end data separation.
Intelligent format conversion: Automatically converts stored date identifiers into a user-friendly display format.
Protocol Adaptation Routing: /video/ implements HLS protocol streaming.
M3U8 Index Distribution: Directly returns a playlist file for the specified date.
TS Segment Processing: Use the serve_ts_files method to accurately locate and securely verify segment resources.
Four-fold security verification: Includes file name length verification, date format verification, directory existence check, and file physical presence verification.

2. Interface Service Layer

Streaming Service Layer

from flask import Flask, render_template, jsonify, send_from_directory
import os
from datetime import datetime
app = Flask(__name__)
DATA_DIR = '../ffmpeg/data'
def get_available_dates():
    dates = []
    for dirname in os.listdir(DATA_DIR):
        dir_path = os.path.join(DATA_DIR, dirname)
        m3u8_path = os.path.join(dir_path, 'playlist.m3u8')
        if os.path.isdir(dir_path) and os.path.exists(m3u8_path):
            try:
                datetime.strptime(dirname, '%Y%m%d')
                dates.append(dirname)
            except ValueError:
                continue
    dates.sort(key=lambda x: datetime.strptime(x, '%Y%m%d'))
    return dates
@app.route('/')
def index():
    return render_template('index.html')
@app.route('/api/dates')
def api_dates():
    dates = get_available_dates()
    formatted_dates = [f"{d[:4]}-{d[4:6]}-{d[6:8]}" for d in dates]
    return jsonify({'dates': formatted_dates})
@app.route('/video/<filename>')
def serve_m3u8(filename):
    if ".ts" in filename:
        return serve_ts_files(filename)
    else:
        dir_path = os.path.join(DATA_DIR, filename)
        return send_from_directory(dir_path, 'playlist.m3u8')
def serve_ts_files(filename):
    # Extract the date portion (first 8 digits) from a file name
    if len(filename) < 8:
        return "Invalid filename", 400
   
    date_str = filename[:8]
    try:
        # Validate date format
        datetime.strptime(date_str, '%Y%m%d')
    except ValueError:
        return "Invalid date format in filename", 400
    # Constructing file paths
    dir_path = os.path.join(DATA_DIR, date_str)
    print("dir_path is ", dir_path)
    if not os.path.isdir(dir_path):
        return "Date directory not found", 404
    file_path = os.path.join(dir_path, filename)
    print("file_path is ", file_path)
    if not os.path.exists(file_path):
        return "File not found", 404
    return send_from_directory(dir_path, filename)
if __name__ == '__main__':
    app.run(debug=True)

3.2.3.2 Front-end

1. The front-end primarily handles human-computer interaction, using the HLS streaming protocol to enable cross-platform video playback. A visual user interface is built for date selection, playback control, and status feedback, forming a complete streaming solution in conjunction with the back-end services.

Presentation Layer Design

Responsive Layout:
Visual Hierarchy:
Status Visualization:
Dual-Mode Compatibility:
Intelligent Instance Management:
Event-Driven Loading:
Timed Operation Flow:
Playback Status Linkage:
Intelligent Date Conversion:

Desktop adaptation achieved through container maximum width constraints and automatic margins

The control bar uses Flex layout to maintain element spacing, and the video player features a full-width black background to enhance visual focus

Dynamically displays playback progress and total duration for enhanced feedback

2. Streaming Adaptation Layer

The Hls.js library is preferred for advanced feature support

A native HTML5 player is used as an alternative to ensure usability in environments like Safari

Dynamically destroys and recreates HLS objects to avoid multi-stream memory leaks

Loading order is ensured through the MEDIA_ATTACHED and MANIFEST_PARSED event chains

3. Control Logic Layer

Select a date → Generate a standardized request → Initialize the playback engine → Automatically play

User actions (play/pause) are synchronized with the video element status in real time

Automatically converts display format dates to storage format parameters

<!-- templates/index.html -->
<!DOCTYPE html>
<html>
<head>
    <title>Video surveillance system</title>
   
    <style>
        .container {
            max-width: 800px;
            margin: 20px auto;
            padding: 20px;
        }
        .controls {
            margin-bottom: 20px;
            display: flex;
            gap: 10px;
            align-items: center;
        }
        #videoPlayer {
            width: 100%;
            background: #000;
        }
        button {
            padding: 5px 15px;
            cursor: pointer;
        }
    </style>
</head>
<body>
    <div class="container">
        <div class="controls">
            <select id="dateSelect">
 <option value="">Select Date</option>
 <button id="playBtn">Play</button>
 <button id="pauseBtn">Pause</button>
 <button id="pauseBtn">Pause</button>
 <span id="status">Ready</span>
        </div>
        <video id="videoPlayer" controls></video>
    </div>
    <script>
        const video = document.getElementById('videoPlayer');
        let hls = null;
        let currentDate = '';
        // Initialize HLS support detection
        if (Hls.isSupported()) {
            hls = new Hls();
            hls.attachMedia(video);
        }
        // Loading available dates
        fetch('/api/dates')
            .then(res => res.json())
            .then(data => {
                const select = document.getElementById('dateSelect');
                data.dates.forEach(date => {
                    const option = document.createElement('option');
                    option.value = date;
                    option.textContent = date;
                    select.appendChild(option);
                });
            });
        // Date Selection Event
        document.getElementById('dateSelect').addEventListener('change', function() {
            const selectedDate = this.value;
            if (!selectedDate) return;
            currentDate = selectedDate.replace(/-/g, '');
            const m3u8Url = `/video/${currentDate}`;
            
            if (Hls.isSupported()) {
                if (hls) {
                    hls.destroy();
                }
                hls = new Hls();
                hls.attachMedia(video);
                hls.on(Hls.Events.MEDIA_ATTACHED, () => {
                    hls.loadSource(m3u8Url);
                    hls.on(Hls.Events.MANIFEST_PARSED, () => {
                        video.play();
                    });
                });
            } else if (video.canPlayType('application/vnd.apple.mpegurl')) {
                video.src = m3u8Url;
                video.addEventListener('loadedmetadata', () => {
                    video.play();
                });
            }
        });
        // Playback Controls
        document.getElementById('playBtn').addEventListener('click', () => {
            video.play();
        });
        document.getElementById('pauseBtn').addEventListener('click', () => {
            video.pause();
        });
        // Update status display
        video.addEventListener('timeupdate', () => {
            const status = document.getElementById('status');
            status.textContent = `Playing - ${formatTime(video.currentTime)}/${formatTime(video.duration)}`;
        });
        function formatTime(seconds) {
            const date = new Date(0);
            date.setSeconds(seconds);
            return date.toISOString().substr(11, 8);
        }
    </script>
</body>
</html>

Demo video: https://www.bilibili.com/video/BV1zSXYYAEhJ/ 4. Future Prospects

4.1 Event Detection and Quick Navigation

Currently, the frontend can only manually jump through video.

Future improvements: Use face recognition to identify family members and detect strangers, adding event tracks to allow fast navigation in the playback timeline.

4.2 Integration with HomeAssistant

Current implementation is independent with custom frontend/backend.

Embedding the system into HomeAssistant (HASS) would broaden its application and accessibility.

Project Details