• RK3576 with UFS Storage: In-depth Analysis of Performance Advantages and Read-Write Test Data

    a day ago 0 comments

    In embedded storage field, UFS (Universal Flash Storage) is gradually emerging. UFS is a type of flash memory. Similar to eMMC, it integrates a control chip, accesses a standard interface, and undergoes standard packaging on the basis of NAND storage chips, thus forming a highly integrated storage chip. Due to its compact characteristics, UFS is widely used in embedded devices such as mobile phones and tablets. Moreover, since UFS far outperforms eMMC in terms of performance, it is often used in high-end products.

    Advantages of UFS

    1. Faster response speed for multitasking

    Devices using UFS2.0. LVDS (Low-Voltage Differential Signaling) has a dedicated serial interface, allowing read and write operations to be carried out simultaneously. The CQ (Command) queue dynamically allocates tasks without waiting for the previous process to end. It’s like a car getting on the highway, with multiple lanes allowing high-speed and smooth travel. In contrast, mobile phones using EMMC must perform read and write operations separately, and the instructions are also packaged. In terms of speed, EMMC is already at a disadvantage, and it is naturally slower when performing multitasking. It likes traveling on an common two-lane road with speed limits.

    2. Low latency, UFS has a 3-times faster response speed

    When reading large-scale games and large-volume files, UFS2.0 takes less time. The time required to load a game is one-third of that of EMMC5.0. Correspondingly, when experiencing games, mobile phones with UFS2.0 have lower latency and smoother pictures.

    3. Shorter loading time for photo thumbnails in the album

    Taking the mobile phone album as an example, many people’s mobile phones are filled with hundreds or even thousands of photos. When you open the photo thumbnails in the album, you can clearly see the loading process. This is caused by the fact that the mobile phone cannot keep up with the refresh speed when reading photos from the flash memory. On a mobile phone with a good screen, the pictures will load smoothly as you scroll, while on a less-capable mobile phone, you can clearly feel the lag during loading.

    4. Faster speed and lower power consumption

    After the UFS chip improves its speed, it means that it takes less time to complete the same task. Higher efficiency means lower power consumption. When working simultaneously, the power consumption of UFS is 10% lower than that of eMMC, and it can save approximately 35% of power consumption in daily work.

    UFS interface read-write performance test

    RK3576 CPU also provides a UFS2.0 interface and an emmc5.1 interface.

    FET3576-C SoM also reserves a UFS interface.

    Refer to Rockchip’s official document “Rockchip_Developer_Guide_UFS_CN_V1.3.0” to conduct read-write tests on the UFS flash memory of OK3576-C.

    Sequential write test

    root@ok3576-buildroot:/# fio -filename=/dev /sda -direct=1 -iodepth 
    32 -thread -rw=write -bs=1024k -size=1G -numjobs=8 -runtime=180 
    -group_reporting -name=seq_100write_1024k
    seq_100write_1024k: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 
    1024KiB-1024KiB, ioengine=psync, iodepth=32 
    ... 
    fio-3.34 
    Starting 8 threads 
    note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
    capped at 1 
    note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
    capped at 1 
    note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
    capped at 1 
    note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
    capped at 1 
    note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
    capped at 1 
    note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
    capped at 1 
    note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
    capped at 1 
    note: both iodepth >= 1 and synchronous...

    Read more »

  • A Comprehensive Guide to Deploying the DeepSeek-R1 Large Model on the Forlinx OK3588-C Development Board (Part 1)

    3 days ago 0 comments

    DeepSeek, as a representative of AI Large Language Models, has attracted extensive attention in the global artificial intelligence field with its excellent reasoning ability and efficient text-generation technology. As the latest iteration of this series, DeepSeek-R1 has achieved breakthroughs in technological dimensions such as the leap in long-text processing efficiency, multi-modal expansion planning, and embedded adaptation.

    RK3588, as the flagship chip launched by Rockchip, has become an ideal platform for embedded AI applications with its multi-core heterogeneous computing power and powerful CPU, GPU, and NPU performance. The in-depth integration of DeepSeek-R1 and the OK3588-C development board marks the extension of large AI models from the cloud to the edge. This collaborative model of "advanced algorithm + customized chip" not only meets key requirements such as real-time performance and privacy protection on the edge side but also builds a complete value chain from technology R & D to industrial empowerment, providing a reusable innovation paradigm for the intelligent transformation of various industries. Next, let's delve into how this process is specifically implemented.

    01 Transplantation Process

    (1) Download the DeepSeek-R1 Source Code

    Download the DeepSeek-R1-Distill-Qwen-1.5B weight file from the official website of DeepSeek-R1 on the Ubuntu virtual machine.

    (2) Install the Conversion Tool

    Create a virtual environment on Ubuntu and install RKLLM-Toolkit to convert the DeepSeek-R1 large language model into the RKLLM model format and compile the executable program for board-side inference.

    (3) Model Conversion

    Use RKLLM-Toolkit to convert the model. RKLLM-Toolkit provides model conversion and quantization functions. As one of the core functions of RKLLM-Toolkit, it allows users to convert large language models in Hugging Face or GGUF format into RKLLM models, enabling the RKLLM models to be loaded and run on the Rockchip NPU.

    (4) Compile the DeepSeek-R1 Program

    Install the cross-compilation toolchain to compile the RKLLM Runtime executable file. This program includes all processes such as model initialization, model inference, callback function processing output, and model resource release.

    (5) Model Deployment

    Upload the compiled RKLLM model and executable file to the board for execution. Then, you can have a conversation with DeepSeek-R1 on the debugging serial port of the OK3588-C development board without an internet connection.

    02 Demo

    DeepSeek-R1 is a multi-functional artificial intelligence assistant that can provide efficient and comprehensive support in multiple fields. Even the local offline version can give accurate and practical suggestions based on its powerful data-processing ability and extensive knowledge repository, whether it's for daily information retrieval needs, maintenance guidance for professional equipment, solutions to complex mathematical problems, or assistance in completing programming tasks. It has become a reliable partner for users in exploring various fields.

    (1) General Information Search

    DeepSeek-R1 can quickly retrieve and provide accurate information. For example, when asked about "Forlinx Embedded Technology Co., Ltd.", DeepSeek-R1 can introduce the company's background, main business, product features, etc. in detail, helping users understand the company comprehensively.

    (2) Maintenance Advice for Professional Equipment Problems

    For professional equipment problems, DeepSeek-R1 can provide detailed fault analysis and solutions. For instance, regarding the problem of the PLC reporting error code E01, R1 analyzes the possible causes of the fault, such as power supply problems, wiring errors, or hardware failures, and provides corresponding solution steps to help users quickly troubleshoot the fault.

    (3) Solving Math Problems

    DeepSeek-R1 has excellent mathematical operation ability and is good at solving various mathematical problems. For example,...

    Read more »

  • How to Compile and Run NPU Test Programs Based on rknn_yolov5_demo on RK3568?

    02/13/2025 at 07:41 0 comments

    When developing NPU (Neural Processing Unit) related applications on RK3568, compiling and running test programs is a crucial step. This article will take rknn_yolov5_demo as an example and guide through each step in detail.

    1. Preparation: Locate the Compilation Script

    Open external/rknpu2/examples/rknn_yolov5_demo/build-linux_RK3566_RK3568.sh

    Preparation: Locate the Compilation Script

    2. Key Configuration: Modify the Cross-Compilation Toolchain Path

    After opening the build-linux_RK3566_RK3568.sh file, modify the GCC_COMPILER to the path of the cross-compilation toolchain and save the file.

    Key Configuration: Modify the Cross-Compilation Toolchain Path

    3. Compile the rknn_yolov5_demo Program

    In the terminal command window, navigate to the rknn_yolov5_demo folder:

    cd external/rknpu2/examples/rknn_yolov5_demo/

    Compile the rknn_yolov5_demo Program

    Run the build-linux_RK3566_RK3568.sh script to compile the program:

    ./build-linux_RK3566_RK3568.sh

    Compile the rknn_yolov5_demo Program

    4. File Transfer: From Local to Development Board

    Copy the contents of the install directory to the development board.

    File Transfer: From Local to Development Board

    5. Navigate to the Correct Directory: Preparing to Run the Program

    Navigate to the corresponding directory on the development board.

    Navigate to the Correct Directory: Preparing to Run the Program

    6. Set the Library File Path

    export LD_LIBRARY_PATH=./lib

    Set the Library File Path

    7. Run the Program to Identify Object Categories in the Image

    The command format to run the program is: ./rknn_yolov5_demo

    ./rknn_yolov5_demo ./model/RK3566_RK3568/yolov5s-640-640.rknn ./model/bus.jpg

    Run the Program to Identify Object Categories in the Image

    8. View the Results

    Finally, copy the resulting image generated on the OK3568-C development board to local computer and view it. This way, the program's object recognition results in the image can be visually observed, the accuracy of the recognition can be checked, and the performance of the NPU test program can be evaluated.

    View the Results

    View the Results

    By following these detailed steps, the rknn_yolov5_demo NPU test program on RK3568 can be successfully compiled and run. It is hoped that this article will be helpful for the development work.