The last draft gave a broad outline of the system. Let's refine it.
The very first defining characteristic is that the data stream must be able to be transmitted over a SPDIF link. Hence the 32 bits per atomic data. This is equivalent to what a CD player would output but instead of sounds, the data would represent sensors' values.
- The stream could be played from a pre-recorded audio CD for example
- conversely the stream could be recorded to a CD, but not a MiniDisc because it uses lossy compression.
- The medium can reuse cheap TOSlink transceivers and fibres, or 75Ω RCA connectors/patch cables, but UDP over RJ45 could also work for encapsulation.
- The stream could be received and emitted by a DAW (Digital Audio Workstation) that recognises the specific type of data.
- Since it's considered as raw audio data by underlying components, it is transparent and could find its way in the web browsers thanks to the HTML5 Web Audio API. N00N could even encapsulate other one-way protocols over these links...
- SPDIF/Toslink transmits ancillary data in extra bits but they are not accessible to most data sinks. For example, track names can be sent by a CD player, but this is not available in the raw audio data stream. Hence ALL the management data must be transmitted "in band". Which is nice because this allows easy storage in a dumb computer file.
- Inactive state of the stream is "0" samples. They must separate the packets on a continuous stream (this eases parsing a bit).
Chaining
Like MIDI, the devices can have an input and/or an output. Devices can be daisy-chained (but there is no "Thru") so several instruments (keyboards, knob panels, pedals, expanders, mixers, whatevers) share a single stream than can be recorded by the last devices in the chain.
Of course it is possible to have multiple inputs in a multiplexer-like devices that will coalesce all the data into a coherent stream, but it will have to rewrite all the timecodes.
In a chain, the first device receives no information so it outputs its own timecode, which synchronises the rest of the chain. A timecode generator could be an independent device, or generated by a master DAW. The rule is simple though : if a device does not receive an external timecode in one second, it outputs its own local timecode, otherwise it follows and synchronises to the external timecode (using a software PLL for example). Of course, the received timecode must be valid, with like 3 consecutive timecodes with coherent values (they must increase monotonously at a reasonable rate).
The problem with chains is when/if a device in the middle is dysfunctional... or if one link is broken. This also cuts all the upstream devices.
Bandwidth
At the end of the chain, the last device receives the sum of all the streams generated upstream. The result might exceed the bandwidth available. Several techniques are borrowed from switched packet networks.
- A decent FIFO is recommended at the input of the device. 4KiB seems to be a minimum. Too large would create "bufferbloat" and losses in subsequent devices which don't have such a large FIFO. 16KiB is a decent value that I hope is never getting filled completely. BTW : ideally, FIFO size should not be required to be larger than the size of the packets that the device can send, but some margin can always help.
- If the input FIFO overflows, the next incoming packets are discarded entirely
- This is why each packet should be "standalone" and not depend on the next or previous packet.
- For a link with limited capacity, activity LEDs would indicate the FIFO usage and average bandwidth occupation.
- Eventually, the device could have "nice" settings such as refresh rate (10, 100, 1000Hz ?) and prioritisation of its own packets (like: drop X% of my own packets and 100-X% of the incoming ones)
- Devices closer to the end would mechanically have the higher priority because less chances to drop the data
Timing
The stream contains timecodes that may be ignored in real-time performance but allow recording and playing in plain files (as do MIDI streams already). The timecode format allows a simple and direct timeshifting (because it's plain fractional binary) and can wrap around when the range is exceeded (so you can record more than 9h of performance. Only timecodes that leap forward (within a decent window) are legal, otherwise they get discarded (unless too many are received during 1 second).
Well there is also the exception of a null timecode, value #0000, indicating "no internal clock", for example for dumb sources with no input. This is not recommended because the data/packets could be discarded and the sink (or the next device in the chain) could rewrite the timecode (though it's extra efforts that would be best spared for more useful things).
But there is the question of a source that does not increment its own timecode for a while, to send a large chunk of data for example, because it would clog the chain if the chunk exceeds the FIFO size.
Here, the timecodes also replace the "sequence numbers" found in other formats/protocols. This is due to several factors :
- There are more than one source and their streams get interleaved, and each device is a chain should not have to rewrite the packet header anyway,
- each source can have its own sequence number, but they get synchronised such that the time codes should not go backwards anyway
- Each packet should be standalone, fragmentation
This results in a shorter/smaller header.
Compression
Since all the packets must be stand-alone and no resend is possible, the compressed data can't rely on time-based deltas. The #Recursive Range Reduction (3R) HW&SW CODEC looks ideal for this situation. A baseline compaction/decompaction routine would be developed to handle 256 numbers of 16 bits. Negative numbers would be pairs of 16-bit numbers with one of them zero, and the result is the subtraction of these numbers (which can be positive or negative and 3R would manage the 0 value).
Data sinks
Since the stream is potentially asynchronous and lossy, the data sinks (the expanders, oscillators etc.) must correctly interpolate data at the internal higher sampling rate.
Any source could fall offline at any moment so failsafes, watchdogs, timers etc. must detect and correct "anomalous conditions". Proper operation should be resumed within 1 second. After all, you never know when your mate will trip on a dangling wire, and you don't want it to end the whole show.
Naming
The sinks "attach" or "listen" to a given channel, associated to a given source. Each channel has a UTF-8 label sent regularly by the sources or upon receiving a "ping"/"enumerate" packet, so the users don't have to rely on numbers only. If a source receives a packet that contain its own channel ID, it must change its own ID to a different random one to prevent collision (should be changed) : this is a dynamic arbitration system, so the label is what really matters. Thus, as long as all the labels are different, you could plug any device anywhere in the chain and forget about low-level IDs, they would be random anyway (unless you decide to fix them statically). Eventually, a message could order a given channel to change its ID or label for convenience.
With 16 bits for channel ID (leaving 16 bits for type and flags), there are low chances that 2 random IDs will collide. See the Birthday Paradox however. But with half a dozen devices in a chain, and a decent re-allocation scheme with decent entropy, the operation should be pretty smooth. Even crazy long chains should work, and bandwidth+latency will become the problem (particularly with a store&forward mechanism) long before ID collision becomes a critical thing.
Anyway, each source MUST have a means to input its UTF8 label. It could be a simple push-button on the front panel that will enable the reception of a rename packet command, or a full-blown keyboard, or whatever...
Length of the label could be "up to" 255 bytes though maybe 32 or 64 bytes could be stored by smaller implementations...
-o-O-0-O-o-
Wow, the more I write about it, the more it draws from 2 decades of design experience. The #PEACand #3R algorithms were designed for this sort of application and purpose, so I'm glad they all come together at last. The protocol does not look like Open Sound Control at all since OSC is a verbose JSON/XML-like textual format, while N00N is raw binary and lightweight to interpret with limited CPU resources. Maybe I'll succeed one day in implementing it in my #RD1000 ?
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.