High end audio gear is designed to be as linear as possible, to produce the most accurate sound. Non linearities lead to clipping, distortion, buzzing, rattles, and all kinds of things cheap speakers do. That's not good for good tunes!
I want to use machine learning and feedback to 'learn' these non linearities, so even cheap speakers can rival good ones.
If we learn exactly which input sounds cause rattles, we can modify them to reduce or eliminate the unwanted sounds. Doing that by hand would be far to slow, but machine learning will do it automatically.
It won't be perfect. There are some things physics limits us on - for example, a bigger diameter speaker can move more mililiters of air, and no algorithm fixes that, but we can get far closer.
Long term, this idea can also make the good ones even better, letting them be smaller, lighter, cheaper, or better in any way that current heavy precisely machined magnetic components prevent.
It turns out in my diagram in the previous update that the microphone quality doesn't matter. Here's why:
I play a sound x through the good speaker (G), record it with the shitty microphone (M). I then do the same with the modified sound y and bad speaker (B), and analyse the difference.
Turns out I'm analysing:
M(G(x)) - M(B(y))
And since M is present in both terms, the eventual result after many repeats will be:
M(G(x)) ~= M(B(y))
G(x) ~= B(y)
So the microphone got cancelled out! (Note this is only strictly true if function M is invertible, but we'll assume it approximately is...)
In fact... Don't... Because thats one of those Wikipedia pages that's only good if you already have a PhD in the topic... Also, all that maths only applies to linear systems, which is the very assumption we don't want to make about our speaker!
The summary is this though: When you thing you have figured out what the speakers doing (ie. for a given input signal, you have figured out what sound it should make, and what sound it really makes, and hence the 'error'), if you apply the inverse signal to cancel it out, and the problem gets worse instead of better, then you've fallen foul of the Nyquist stability criterion.
Luckily, *for a given frequency*, if this happens and one instead adds instead of subtracts the error, you're guaranteed to end up with a stable system... Magic...
I found my rubbish speaker... I found a microphone... I found a good speaker...
The goal is to play a sound with the good speaker. Then play one with the rubbish speaker. Then figure out the main differences in the sound via doing an FFT and looking for the most major differences.
Next, calculate a signal which will cancel the differences, and repeat the whole process with the new signal added in for the trash speaker only. Over time, the differences will diminish towards zero.
There has been a great deal of value to me in my involvement with the project. Would like to share it with the sound of text male voice team so they can also read it and implement something new.
Yep. It does something similar, but despite the wikipedia description using the phrase "linear", it doesn't actually model the system as non-linear time invariant.
smaart is a suite of tools, but most of them revolve around finding the impulse response from one or more speakers in a room (And hence the transfer function, frequency response and phase response).
I'm effectively trying to do this, but without assuming the system is linear, and therefore, I hope to get better results.
The disadvantage is for a non linear system, the search space to fully define the system is theoretically infinite, so we still need to make some assumptions on dimensionality to reduce the search space, and use something better than a brute force solution to search it. I'm still expecting automatic "training" of the system to take days/weeks though.
There has been a great deal of value to me in my involvement with the project. Would like to share it with the sound of text male voice team so they can also read it and implement something new.