I've been alerted to a simplified FFT called the Goertzel algorithm. If the fundamental frequency never has to go about about 4.5kHz, a standard 8 bit 16MHz Arduino may be sufficient in speed. This is based on the max 9.6kHz MSps sample rate and Nyquist theorem that says less than half the sample frequency can be captured. A 4.5kHz lowpass filter of sufficient sharpness should do just fine.
Goertzel Algorithm An efficient way to implement FFT using Arduino
If this is not fast enough, even the meanest Teensy 3 and up will handle it easily.