Close

Some capture measurements [UPDATED]

A project log for Streamo Encodo Cheapo

Trying to make an inexpensive HDMI encoder with off the shelf stuff

wjcarpenterWJCarpenter 08/10/2024 at 20:500 Comments

I finally got around to doing some video captures with the dongles on a couple of Raspberry Pi boards. For these experiments, the OS was Raspberry Pi OS Lite (64 bit), aka Debian Bookworm, running from an SD card. The "lite" OS does not include a graphical desktop. I chose to avoid that overhead since my experiments would be command line driven anyhow. I also tested on HP EliteDesk 800 G3 i7-6700T 2.80GHz (8gb RAM, 8 threads).

The dongle I used is the one with the MS2130 chip. The one I have has a USB-C connector, and I used a simple mechanical USB-C to USB-A adapter. The RPi 3 has only USB 2.0 ports, but I used a USB 3.0 port on the RPi 4. For the HDMI input, I used a Google Chromecast dongle (the HD version, not the 4K version). It was just repeatedly displaying various setup animations. I don't think there was any audio. In any case, I didn't try to capture any audio because I knew the video would dominate the performance challenge. CPU figures below are by eyeballing htop and averaging across all 4 cores for the RPi boards and 8 cores for the x86_64.

Test 1:

This first test tries to capture what I ultimately need: an MPEG-TS stream at FHD resolution. The capture dongle outputs 50 fps at that resolution for MJPEG. The command transcodes that into MPEG2

ffmpeg -f v4l2 -video_size 1920x1080 -input_format mjpeg -i /dev/video0 -f mpegts out.mkv 

RPi 3: Transcode 11 fps. CPU about 50%.

RPi 4: Transcode 25 fps. CPU about 50%.

x86_64: Transcode 50 fps. CPU about 13%.

Test 2:

This is the same experiment, but with YUYV input instead of MJPEG. The dongle's advertised rate is only 10 fps.

ffmpeg -f v4l2 -video_size 1920x1080 -input_format yuyv422 -i /dev/video0 -f mpegts out.mkv

RPi 3:  Transcode 8+ fps. CPU about 35%.

RPi 4:  Transcode 10 fps. CPU about 20%.

x86_64: Transcode 10 fps. CPU about 4%.

Test 3:

This is a repeat of Test 1, but with the "ultrafast" preset for H.264.

ffmpeg -f v4l2 -video_size 1920x1080 -input_format mjpeg -i /dev/video0 -preset ultrafast -f mpegts out.mkv

 RPi 3: Transcode 11 fps. CPU about 50%.

 RPi 4: Transcode 25 fps. CPU about 40%.

x86_64: Transcode 50 fps. CPU about 14%, with one thread outlier of 35%.

Test 4:

The same as Test 3, but with YUYV input.

ffmpeg -f v4l2 -video_size 1920x1080 -input_format yuyv422 -i /dev/video0 -preset ultrafast -f mpegts out.mkv

 RPi 3: Transcode 8+ fps. CPU about 35%.

 RPi 4: Transcode 10 fps. CPU about 20%.

x86_64: Transcode 10 fps. CPU about 4%.

Test 5:

This test captures the incoming video without transcoding. For some scenarios (not mine), this allows for later transcoding as a separate step. 

ffmpeg -f v4l2 -video_size 1920x1080 -input_format mjpeg -i /dev/video0 -c copy out.mkv

RPi 3:  Transcode 50 fps. CPU less than 5%.

RPi 4:  Transcode 50 fps. CPU less than 5%.

x86_64: Transcode 50 fps. CPU less than 1% with some threads completely idle.

Test 6:

ffmpeg -f v4l2 -video_size 1920x1080 -input_format yuyv422 -i /dev/video0 -c copy out.mkv

 RPi 3: Transcode 4+ fps. CPU less than 5%.

 RPi 4: Transcode 8+ fps. CPU less than 5%.

x86_64: Transcode 50 fps. CPU less than 5% with some threads completely idle.

Observations:

It's no surprise that the RPi 4 performed better than the RPi 3. For both boards, the only things that could keep up were the captures of raw encoding of MJPEG. The ultrafast transcoding of YUYV kept up, but only to match the dongle's limitation of 10 fps, which is not acceptable for entertainment video. The x86_64 box was able to keep up at the full frame rate with plenty of overhead. In a future experiment, I'll connect multiple dongles to see how it handles it. The x86_64 box also has a single USB-C port, so I gave that a quick try with Test 1. It resulted in a higher frame rate (60 fps) at the cost of slightly higher CPU. It might be possible to operate multiple dongles on a single USB-C port with the use of a true USB-C hub. The box I have has, in contrast, 6 USB-A ports.

The raw video captures, even on the RPi 3 used very little CPU resources, so it is probably feasible to use an RPi 3 or RPi 4 as a capture device for multiple HDMI dongles. I didn't try it, but I suspect that an RPi 2 would also be up to the task. All of that obviously just delays the heavy lifting of the transcoding job. For simply playing the videos (for example, in VLC), the raw videos are fine and would work as an intermediate step in some video production pipeline. For capturing livestreaming video in near real time, it's not very suitable. The files are enormous. I did an experiment of sending the raw video to ffmpeg's built-in streaming server. ffmpeg warned me that the video was being mux'd in as a private stream and my player might not recognize it. That turned out to be the case with trying to play the stream with VLC (it just kept reading but displaying a black window).

When I copied the YUYV raw video capture to another machine and viewed it in VLC, there were some obvious "jaggies" artifacts in what should have been solid lines. In contrast the raw MJPEG showed the lines smoothly. That could just be due to using a cheap HDMI capture dongle.

One could move up to an RPi 5 (which I don't have on hand) to get better performance, but that starts to approach the price point of the dedicated HDMI encoder devices. Second-hand x86_64 boxes look like a better bet for this scenario at the cost of greater power consumption.

Discussions