The details about the S/PDIF protocol are standardized worldwide as IEC-60958. Unfortunately the IEC asks ridiculous amounts of money (hundreds of dollars) for a copy of the standard. As a hobbyist, I don't have wads of cash that big lying around.
The good news is that the Bureau of Indian Standard makes their own version of the standard available for non-commercial purposes, and it can be found at:
- https://archive.org/details/gov.in.is.iec.60958.1.2004 (part 1)
- (there is no part 2 as far as I know)
- https://archive.org/details/gov.in.is.iec.60958.3.2003 (part 3, consumer)
- https://archive.org/details/gov.in.is.iec.60958.4.2003 (part 4, professional)
In this project, I want to find a way to decode the information from Part 3 (I don't have any devices that use part 4), but before I can get there, I have to also implement hardware and software to implement part 1.
There are many descriptions of the S/PDIF standard online such as this one: http://www.epanorama.net/documents/audio/spdif.html. It has also been described many times in magazines such as Elektor. The following is a description in my own words of the most important parts of the S/PDIF standard, based on the information in the BIS documents.
Scope
In this project, I will only consider PCM-encoded stereo audio (e.g. from a CD player), at 48,000 or 44,100 or 32,000 samples per second, 16 bits per sample. The IEC-60958 standard allows more channels, compressed audio, different sample frequencies and up to 24 bits per sample, but those are not interesting and the Propeller probably can't keep up with them anyway. Basically for my purpose I want to be able to handle eveything that a CD player or DAT recorder can generate.
Biphase Encoding
In order to decode S/PDIF and analyze the subchannel data, we need to know the following. Note: when a list of bits is given, they are shown in chronological order which, in this standard is usually LEAST significant bit first. So "0100" means four bits with decimal value 2.
- Each bit is encoded using Biphase modulation. This means that the signal always changes polarity (low to high or high to low) at the beginning of each data bit. For bits with value 0, the the polarity doesn't change within the duration that the bit is transmitted, but for bits with value 1, the polarity also switches in the middle of the bit. You could regard each encoded bit as two biphase bits, as follows:
- Data bit value "0" is represented as biphase "00" if the previous biphase bit was 1.
- Data bit value "0" is represented as biphase "11" if the previous biphase bit was 0.
- Data bit value "1" is represented as biphase "01" if the previous biphase bit was 1.
- Data bit value "1" is represented as biphase "10" if the previous biphase bit was 0.
Subframes and Frames
Audio samples are broken up into frames and subframes. A frame is an audio sample for multiple channels (for stereo: left and right), all of which are played at the same time. On the S/PDIF connection, the subframes for the left and right channel are interleaved, with the left subframe transmitted before the right subframe and any further subframes. When a receiver sees a left subframe coming in, it knows that it has received all the subframes for the previous frame (the left subframe is the start of the new frame) and plays all those previous subframes simultaneously while it receives the new left channel subframe.
Encoding of a Subframe
Regardless of the number of bits in an audio sample, a subframe always has 32 bits, numbered 0-31. Bit 0 is transmitted and received first.
- The encoding of bits 0-3 (8 biphase bits) deliberately violates the Biphase encoding, to establish synchronization. This is called the Preamble, and it provides the decoder with a way to synchronize subframes and blocks. See below for more information.
- The audio sample follows the preamble at bits 4 to 27 (inclusive), encoded with lsb first, msb last.
- For CD's, DAT recorders etc., the audio is encoded as a 2's complement signed value. Value 0 is the middle (silence). All binary values that can be represented, are valid.
- The audio data can either be up to 24 bits of a single audio sample, or 4 bits of a secondary audio channel, followed by up to 20 bits of the primary channel (the latter format is probably rarely used, if at all).
- If the transmitter supports fewer bits, it should leave the unused bits zero. If the receiver supports fewer bits, it should ignore the least significant bits.
- Bit 28 is a Validity bit which is set to 0 if the audio can be played, or 1 if it shouldn't be played. This is used to suppress the audio e.g. when a CD player is reading the Table Of Contents (TOC) of a disc.
- Bit 29 is the User Data bit, used to transfer the User Data subchannel. More information below.
- Bit 30 is the Channel Status bit, used to transfer the Channel Status subchannel. More information below.
- Bit 31 is a parity bit: The parity (i.e. the number of bits that are set to 1) of bits 4 to 31 (inclusive) in a subframe is always even.
Preambles
As I mentioned, each frame consists of a number of subframes (one for each channel, transmitted in the same order every time, left channel first). Frames are grouped together in blocks of 192 frames (384 subframes when operating in stereo), which is necessary for decoding the subchannel data (more information about subchannels later).
In order to recognize the function of each subframe, the standard defines three different preambles: B, M and W:
- The B preamble is encoded as biphase 11101000 (following a 0 biphase bit) or 00010111 (following a 1 biphase bit). The sample in the subframe is for the left channel and indicates the beginning of a block
- The M preamble is encoded as biphase 11100010 (following a 0 biphase bit) or 00011101 (following a 1 biphase bit). The sample in the subframe is for the left channel but it's not the beginning of a block.
- The W preamble is encoded as biphase 11100100 (following a 0 biphase bit) or 00011011 (following a 1 biphase bit). The sample in the subframe is for the right channel.
Subchannels
Each subframe has a User Data bit and a Channel Status bit, as mentioned. These are used as auxiliary data streams of one bit per subframe per subchannel. For synchronization, the B preamble (which marks the first left-channel subframe of a block) is used to mark the first bit in the stream of subchannel data.
The User Data subchannel bits can differ from one subframe to another, so there are 384 User Data subchannel bits in a block. More information about their interpretation follows below.
The Channel Status subchannel only changes at the start of each frame; in other words there are 192 Channel Status bits per block. More information about their interpretation follows below.
User Data Subchannel
Bit 29 of each subframe is the User Data bit. The User Data Bit may change for each subframe so there are up to 384 significant bits (numbered 0-383) in a block, which together form the User Data subchannel.
- For audio CD, the user data bits in a block are divided in 4 "packs" (1-4) of 24 bits each.
- The first pack contains only zero-bits. I think I read somewhere that this may be used to generate track markers (the bits become 1 for 2 seconds before the new track begins and the last block that has one-bits is the first block of the new track. However this is not in the standard so I may be wrong).
- The other 3 packs each contain a bit with value 1, followed by the Q,R,S,T,U,V,W subchannel bits defined in the Red Book, followed by four zero-bits. Each group of six bits R,S,T,U,V,W is called a Symbol.
- The standard describes an improvement on this, called the General User Data Format. This is to be used for new types of media, and it's backwards compatible with the format used on CD Audio.
- The stream of user data bits (one per subframe, 384 per block), are a continuous stream.
- The data contains Messages separated by at least 8 zero-bits.
- Each Message contains between 3 and 129 (but not 96) "Information Units" (IUs). A length of 96 IUs is reserved as a special format for Audio CD and Minidisc.
- IUs contain one start bit "1" followed by 7 bits of data (Q,R,S,T,U,V,W).
- IUs in the same message are separated by at least 0 stopbits and at most 2 stopbits, with value 0.
- The first IU of a message contains two one-bits, followed by 6 bits that represent a class of information. Bit pattern 000 for R,S,T is reserved for Digital Compact Cassette, the other values are available for use.
- The second IU of a message contains a number indicating the number of IUs that follow, but this number is encoded in opposite order. For example, the bits "10001010" (including the start bit) signify that 10 IUs are going to follow the second IU as part of the same message. The number can be between 1 and 127, but value 94 is illegal because it would make it incompatible with the format of the Audio CD encoding.
- The third IU contains the original category code (see categories below) of the device that generated the user data format, without the L. Example: "1 110 0000" signifies the original category code was a DAT recorder ("110 0000L" with a start bit in front and with the L bit removed).
- The following IUs contain user information where the original data is put into the six R,S,T,U,V,W bits, six bits at a time. The Q bit can be used to indicate that one of the bits R to W contains an error (Q=1).
- Since only 6 bits of data can be packaged into one IU, bytes are shifted and combined as necessary. For example, to transmit two bytes, the first IU contains bits 7,6,5,4,3,2 of the first byte in R,S,T,U,V,W (in that order; msb first!), the second IU contains bits 1 and 0 of the first byte and bits 7-4 of the second byte, and the third IU contains bits 3-0 of the second byte, followed by 2 zero-bits for padding.
- The Annexes for each product category (see category codes below) state a specific format for some categories. I'll get back to this in the future.
Channel Status Subchannel
Bit 30 of each subframe is the Channel Status bit. The channel status bits are always the same between the subframes of a frame, so there are 192 bits per block (numbered 0-191) that make up the Channel Status subchannel.
- Bit 0 is 0 to indicate that the frame and subframe are in the consumer format. When this is set to 1, other rules apply (from part 4 of the standard instead of part 3), which I won't go into in this document.
- Bit 1 is 0 to indicate that the frame is in PCM format. When this is 1, the audio is encoded differently (which I won't go into in this document).
- Bit 2 is the "Cp" bit indicating that the recording is covered by copyright (0) or not (1). This is used by the SCMS copy protection system which is used by all consumer digital audio recorders. The bit may alternate between 0 and 1 at a rate between 4Hz and 10Hz.
- Bits 3, 4 and 5 are the audio mode. For PCM (bit 1=0), they are:
- bit 5=0 bit 4=0 bit 3=0 for 2 audio channels without pre-emphasis
- bit 5=0 bit 4=0 bit 3=1 for 2 audio channels with 50μs / 15μs pre-emphasis (meaning an analog filter is needed to play the audio; I won't go into this)
- Other values are reserved.
- Bits 6 and 7 are zero, indicating that the rest of the channel status data is formatted as follows. Other values are reserved and indicate that the following should be ignored:
- Bits 8-15 (byte 1) are a category code, consisting of a 3-bit category group followed by a 4-bit category in that group. The L bit is used by SCMS, which I won't go into here.
- "000 0XXXL" = Undefined; reserved except for:
- "000 00000" = General, for temporary use
- "000 0001L" = Experimental product, not for commercial release
- "000 1XXXL" = Solid state memory based products
- "000 1000L" = Digital audio recorder/player using solid state memory
- "100 XXXXL" = Laser optical products
- "100 00000" = Compact Disc Digital Audio (Annex A)
- "100 1000L" = Laser Optical Digital Audio (Annex D)
- "100 1100L" = DVD (Annex P)
- "100 1001L" = Minidisc (Annex N)
- "010 XXXXL" = Digital converters, signal processing
- "010 0000L" = PCM encoder/decoder (Annex B)
- "010 0100L" = Digital Audio Mixer (Annex E)
- "010 1100L" = Sample Rate Converter (Annex F)
- "010 0010L" = Digital Sound Sampler (Annex G)
- "010 1010L" = Digital Sound Processor (Annex O)
- "110 XXXXL" = Magnetic tape or disc based products
- "110 0000L" = DAT recorder (Annex C)
- "110 1000L" = Video tape recorder with digital sound
- "110 0001L" = Digital Compact Cassette (Annex M)
- "001 XXXXL" = Broadcast reception
- "001 0000L" = Digital Broadcast Receiver, Japan (Annex H)
- "001 1000L" = Digital Broadcast Receiver, Europe (Annex J)
- "001 0001L" = Software Delivery Interface (Annex L)
- "001 0011L" = Digital Broadcast Receiver, USA (Annex K)
- "101 XXXXL" = Musical instruments, microphones etc. without copyright info
- "101 0000L" = Synthesizer
- "101 1000L" = Microphone
- "011 00XXL" = A/D converters without copyright information
- "011 0000L" = A/D converter
- "011 01XXL" = A/D converters with copyright information in Cp and L bit
- "011 0100L" = A/D converter
- "011 1XXXL" = Broadcast reception (Reserved)
- "000 0XXXL" = Undefined; reserved except for:
- Bits 16-19 are a source number, which can be set to a 4 bit binary number used in systems with multiple inputs. Value "0000" means ignore this number.
- Bits 20-23 are the channel number for multi-channel audio.
- "0000" means ignore
- "1000" means left channel in stereo systems,
- "0100" means right channel
- etc.
- Bits 24-27 indicate the sampling frequency:
- "0010" = 22.05kHz
- "0000" = 44.1kHz
- "0001" = 88.2kHz
- "0011" = 176.4kHz
- "0110" = 24kHz
- "0100" = 48kHz
- "0101" = 96kHz
- "0111" = 192kHz
- "1100" = 32kHz
- "1000" = ignore this number
- Other values are reserved
- Bits 28-29 indicate the clock accuracy (which I won't explain further)
- "00" = level II
- "10" = level I
- "01" = level III
- "11" = frame rate doesn't match sample frequency
- Bits 30 and 31 are not in use (reserved)
- Bit 32 indicates whether the lowest 4 bits of each sample are used for an auxiliary audio signal (bit=0) or whether they are (can be) used to extend the 20 bits of the original S/PDIF to 24 bits (bit=1).
- Bits 33-35 indicate the sample word length:
- "000" = word length not indicated
- "100" = 20 bits (bit 32=1) or 16 bits (bit 32=0)
- "010" = 22 bits or 18 bits
- "001" = 23 bits or 19 bits
- "101" = 24 bits or 20 bits
- "011" = 21 bits or 17 bits
- Bit 36-39 indicate the original sampling frequency:
- "1111" = 44.1kHz
- "1110" = 88.2kHz
- "1101" = 22.05kHz
- "1100" = 176.4kHz
- "1011" = 48kHz
- "1010" = 96kHz
- "1001" = 24kHz
- "1000" = 192kHz
- "0110" = 8kHz
- "0101" = 11.025kHz
- "0100" = 12kHz
- "0011" = 32kHz
- "0001" = 16kHz
- "0000" = unknown
- Other values are reserved.
- Bits 40-191 are not in use (reserved)
- Bits 8-15 (byte 1) are a category code, consisting of a 3-bit category group followed by a 4-bit category in that group. The L bit is used by SCMS, which I won't go into here.
Conclusion
The above is enough information to get started on decoding biphase data into regular bits, subchannels into interesting data, and (if you want) play some music while you're doing all that.