In embedded storage field, UFS (Universal Flash Storage) is gradually emerging. UFS is a type of flash memory. Similar to eMMC, it integrates a control chip, accesses a standard interface, and undergoes standard packaging on the basis of NAND storage chips, thus forming a highly integrated storage chip. Due to its compact characteristics, UFS is widely used in embedded devices such as mobile phones and tablets. Moreover, since UFS far outperforms eMMC in terms of performance, it is often used in high-end products.
Advantages of UFS
1. Faster response speed for multitasking
Devices using UFS2.0. LVDS (Low-Voltage Differential Signaling) has a dedicated serial interface, allowing read and write operations to be carried out simultaneously. The CQ (Command) queue dynamically allocates tasks without waiting for the previous process to end. It’s like a car getting on the highway, with multiple lanes allowing high-speed and smooth travel. In contrast, mobile phones using EMMC must perform read and write operations separately, and the instructions are also packaged. In terms of speed, EMMC is already at a disadvantage, and it is naturally slower when performing multitasking. It likes traveling on an common two-lane road with speed limits.
2. Low latency, UFS has a 3-times faster response speed
When reading large-scale games and large-volume files, UFS2.0 takes less time. The time required to load a game is one-third of that of EMMC5.0. Correspondingly, when experiencing games, mobile phones with UFS2.0 have lower latency and smoother pictures.
3. Shorter loading time for photo thumbnails in the album
Taking the mobile phone album as an example, many people’s mobile phones are filled with hundreds or even thousands of photos. When you open the photo thumbnails in the album, you can clearly see the loading process. This is caused by the fact that the mobile phone cannot keep up with the refresh speed when reading photos from the flash memory. On a mobile phone with a good screen, the pictures will load smoothly as you scroll, while on a less-capable mobile phone, you can clearly feel the lag during loading.
4. Faster speed and lower power consumption
After the UFS chip improves its speed, it means that it takes less time to complete the same task. Higher efficiency means lower power consumption. When working simultaneously, the power consumption of UFS is 10% lower than that of eMMC, and it can save approximately 35% of power consumption in daily work.
UFS interface read-write performance test
RK3576 CPU also provides a UFS2.0 interface and an emmc5.1 interface.
FET3576-C SoM also reserves a UFS interface.
Refer to Rockchip’s official document “Rockchip_Developer_Guide_UFS_CN_V1.3.0” to conduct read-write tests on the UFS flash memory of OK3576-C.
Sequential write test
root@ok3576-buildroot:/# fio -filename=/dev /sda -direct=1 -iodepth
32 -thread -rw=write -bs=1024k -size=1G -numjobs=8 -runtime=180
-group_reporting -name=seq_100write_1024k
seq_100write_1024k: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T)
1024KiB-1024KiB, ioengine=psync, iodepth=32
...
fio-3.34
Starting 8 threads
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be
capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be
capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be
capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be
capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be
capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be
capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be
capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be
capped at 1
Jobs: 8 (f=8): [W(8)][96.0%][w=359MiB/s][w=359 IOPS][eta 00m:01s]
seq_100write_1024k: (groupid=0, jobs=8): err= 0: pid=1296: Thu Jan 1 00:01:32 1970
write: IOPS=332, BW=333MiB/s (349MB/s)(8192MiB/24631msec); 0 zone resets
clat (msec): min=2, max=103, avg=23.55, stdev= 9.15
lat (msec): min=2, max=104, avg=23.77, stdev= 9.15
clat percentiles (msec):
| 1.00th=[ 12], 5.00th=[ 14], 10.00th=[ 15], 20.00th=[ 16],
| 30.00th=[ 18], 40.00th=[ 20], 50.00th=[ 22], 60.00th=[ 25],
| 70.00th=[ 27], 80.00th=[ 31], 90.00th=[ 36], 95.00th=[ 41],
| 99.00th=[ 53], 99.50th=[ 59], 99.90th=[ 68], 99.95th=[ 73],
| 99.99th=[ 105]
bw ( KiB/s): min=206590, max=432470, per=100.00%, avg=342387.68, stdev=7157.63,
samples=385
iops : min= 196, max= 421, avg=331.98, stdev= 7.14, samples=385
lat (msec) : 4=0.11%, 10=0.49%, 20=42.49%, 50=55.44%, 100=1.45%
lat (msec) : 250=0.01%
cpu : usr=1.12%, sys=1.83%, ctx=18228, majf=0, minf=0
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,8192,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
WRITE: bw=333MiB/s (349MB/s), 333MiB/s-333MiB/s (349MB/s-349MB/s), io=8192MiB
(8590MB), run=24631-24631msec
Disk stats (read/write):
sda: ios=165/65464, merge=0/0, ticks=178/1074993, in_queue=1075171, util=99.64%
The print information is as described above, from which it can be known that the speed of sequential writing is 349 MB/s.
Sequential read test
root@ok3576-buildroot:/#fio -filename=/dev/sda -direct=1 -iodepth 32 -thread -rw=read-bs=1024k -size=1G -numjobs=8 -runtime=180 -group_reporting -name=seq_100read_1024k
seq_100read_1024k: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T)
1024KiB-1024KiB, ioengine=psync, iodepth=32
...
fio-3.34
Starting 8 threads
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be
capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be
capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be
capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be
capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be
capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be
capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be
capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be
capped at 1
Jobs: 8 (f=8): [R(8)][100.0%][r=756MiB/s][r=755 IOPS][eta 00m:00s]
seq_100read_1024k: (groupid=0, jobs=8): err= 0: pid=1329: Thu Jan 1 00:08:54 1970
read: IOPS=754, BW=755MiB/s (791MB/s)(8192MiB/10857msec)
clat (usec): min=2331, max=16444, avg=10573.01, stdev=646.85
lat (usec): min=2335, max=16447, avg=10575.10, stdev=646.84
clat percentiles (usec):
| 1.00th=[ 9896], 5.00th=[10159], 10.00th=[10159], 20.00th=[10290],
| 30.00th=[10290], 40.00th=[10421], 50.00th=[10421], 60.00th=[10421],
| 70.00th=[ 10552], 80.00th=[ 10683], 90.00th=[ 10945], 95.00th=[ 12518],
| 99.00th=[ 13042], 99.50th=[ 13173], 99.90th=[ 13960], 99.95th=[ 15139],
| 99.99th=[16450]
bw ( KiB/s): min=762938, max=786629, per=100.00%, avg=772720.14, stdev=979.45,
samples=168
iops : min= 740, max= 767, avg=749.19, stdev= 1.02, samples=168
lat (msec) : 4=0.01%, 10=1.65%, 20=98.34%
cpu : usr=0.37%, sys=3.81%, ctx=24750, majf=0, minf=2048
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=8192,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
READ: bw=755MiB/s (791MB/s), 755MiB/s-755MiB/s (791MB/s-791MB/s), io=8192MiB
(8590MB), run=10857-10857msec
Disk stats (read/write):
sda: ios=64132/0, merge=0/0, ticks=544319/0, in_queue=544320, util=99.26%
The print information is as described above, from which it can be known that the speed of sequential writing is 791 MB/s.
With the continuous development of embedded storage technology and the increasing richness of application scenarios, embedded storage has become indispensable in many fields such as smart homes, in-vehicle infotainment systems, and mobile devices. In the future, both eMMC and UFS will play irreplaceable roles in different application fields by virtue of their respective characteristics.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.