Meta announced a new audio codec, “EnCodec”, that achieves amazing compression using AI. The codec is said to compress audio at 64kbps, ten times faster than the MP3 format, without any loss of quality. Meta says the technology can dramatically improve voice quality over low-bandwidth connections, such as calls in areas with unreliable service.
The research details are described in detail in the paper entitled “High Fidelity Neural Audio Compression“, and a summary is published on the blog by Meta.
The heart of the technology is a three-part system trained to compress audio to the desired size. First, an encoder converts the uncompressed data into a lower frame rate “latent spatial” representation. A quantizer then compresses this compressed signal and is then sent over the network or saved to disk. Finally, a decoder converts the compressed data into audio in real-time using a neural network on a single CPU.
At the end of this process, using Meta’s Discriminator is the key to creating a method that compresses the audio as much as possible without losing the signal’s features and key elements that enable recognition.
Using neural networks to compress and decompress audio is nothing new, especially for audio compression. Still, researchers at Meta have applied the technology to the 48kHz frequency commonly found in music files distributed over the internet. Meta’s technology is the first to apply it to stereo audio (sampling rate slightly better than CD’s 44.1kHz).
As an application, this AI-powered super-compression of voice may support faster, higher-quality calls when network conditions are poor. Ultimately, the technology could provide a rich metaverse experience that doesn’t require significant bandwidth improvements.
For now, Meta’s new technology is still in the research stage. Still, it hints at a future where high-quality audio is available with less bandwidth, which is good news for mobile broadband providers whose networks are strained by streaming media.