This came about because I bought a box from Belkin to do AirPlay 2 for our patio speakers (which have an amplifier I built myself). That Belkin box's analog audio output is about 20 dB too low, but fortunately they include a TOSLINK output jack. Buying the aforementioned $10 box from Amazon solved my problem, but I kinda wanted to know how to build the equivalent, and having done so, try to do better.
The current design comes down to three separate subsections.
First is a TOSLINK receiver module. This is a little plastic thing that takes in 5 volt power and outputs a TTL level stream matching the optical input. There isn't a whole lot to say about this. It's self-contained and just needs a single bypass cap and series inductor for the supply pin. The optical signaling is biphase encoding. This makes for easy clock extraction. Between every clock period the signal changes state. If the input pulse train is the same bit level as the previous bit, then nothing else happens. If it's different, then at the 180º mark there is a second state change. So for an input clock frequency of 1 MHz (just an example), a series of either 00000... or 111111.... is represented by a 1 MHz square wave. A series of 01010101.... is represented by a 2 MHz square wave.
The next section is the digital data stream receiver. For this, I've gone through four different choices. The first iteration used the STA120. That worked, but it only comes as a SOIC28, which is rather large, and seemed to be heading towards obsolescence (did I mention that this whole project is at least 10 years out of fashion?). Replacing that was the DIR9001, which was a good choice, but like the STA120 was limited to 96 kHz sample rates. To go higher, I needed to find a better chip that was still capable of being configured purely with hardware strapping. The best choice I could find was the CS8416. It can go up to 200 kHz sample rates and can be strapped with 8 47 kΩ pull-up-or-down resistors. The CS8416 can take care of any pre-emphasis correction for us, allowing us to just strap the DAC for no pre-emphasis. In all 3 cases, the output was configured for i2s, with the receiver being the master and a master clock of 256x the L/R clock. The problem with the CS8416 is that without any input, it clocks the output at around 750 kHz, which is the minimum VCO clock for the PLL. Sending that slow a clock into the PCM1793 results in a low level hiss on the output even though MUTE is asserted by the error output. This prompted me to try the WM8804. That took care of the hiss, though it does require adding a 12 MHz crystal to the BOM. The other downside is that in hardware mode, there's no support for reporting or handling deemphasis, but so far I've not encountered a source that used it.
The original DAC was a CS4334. It takes i2s input. This consists of the master clock (SCK), bit clock (BCK), L/!R (LRCK), and DATA. Instead of the BCK signal, you can send a de-emphasis selection signal from the STA120, but when I attempted this with a prototype the audio sounded noisy. It would seem that when you don't send SCLK, the CS4334 makes assumptions about the relationship between SCK and the frequency of LRCK to derive an internal bit clock. This seems to not work for the output format of the STA120. That, and the !DEM signal was always high, which implies that emphasis on the digital signal is never used, so there's no harm in not supplying that signal to the DAC.
Along the way I decided to attempt to design something a bit better. This started with the PCM1793 192 kHz 24 bit DAC. That DAC has fairly impressive THD and S:N specs, and requires an external differential to single-ended converter and LPF. That uses the OPA2134 dual op amp, which is also an impressive part in and of itself. As before, the output DATA, BCK, LRCK and SCK (configured at the receiver for 256xFs) are fed directly. In addition, I decided to connect the ERR output of the decoder to the MUTE input of the DAC, just to insure bad data doesn't turn into bad sound. Since the DAC has separate de-emphasis selection pins, I decided to try to support it for the sake of old CD players and the like. Unfortunately, the DIR9001 only has a single de-emphasis output signal and there are four possible de-emphasis configurations, depending on sample rate. It's not convenient to try to figure out which mode to use, so I settled on simply supporting only 44.1kHz de-emphasis, which is the only one likely to be used. This can simply be accomplished by sending the emphasis signal from the DIR9001 to one of the de-emphasis selector pins, grounding the other. For the upgraded CS8416 receiver, de-emphasis at the DAC is disabled, as the receiver can perform the de-emphasis itself. One remaining issue is that the analog output of the PCM1793 is differential and requires a conversion stage to get normal analog audio output. The datasheet has a suggested output amp / filter stage, but the issue is that this requires bipolar power. The only way around this is to either ditch USB power, or require USB PD to request 9 volts (which is silly given that we don't require more than about 1 watt of total power), or use some sort of switching inverter to derive a negative supply. It turns out that the latter option can be done without injecting huge amounts of noise with a carefully chosen charge pump. The MAX1721 has a switching frequency of 125 kHz, which is comfortably above the audio range. Even better is the SP6661, which has a switching frequency of just under a megahertz. Keeping the switching frequency high means that even if any of the switching leaks through to the output, it won't be audible. But with some careful filtering, we can keep the noise at bay. Besides, a proper line level would require more than 5v p-p in any event (10v p-p, which is to say ±5v, is more than enough).
All 3 of the sections are, more or less, copied straight from their respective datasheets.
I had a chance to take apart the $10 Amazon box. It looked more or less the same - a big SSOP28 chip (the STA120 is SOIC28), a SOIC8 (like the CS4334) and a bunch of passive components. I wasn't able to identify the components on the commercial version, unfortunately.
Along the way, I did some listening tests. I didn't try to do them blind or anything, so you could argue that it's all just the placebo effect, but to my ears the PCM1793 sound was much warmer and fuller than the CS4334 or the $10 Amazon box. If you look around on various audio forums, you'll hear others compliment the 1793's sound as well. The parts for the upgraded version are more expensive, but from what I can tell, it does make a difference.