Wav2vec-U: Facebook’s New AI Speech Recognition System That Doesn’t Require Human Participation

Speech recognition is an important factor in the development of AI of Big Tech companies. Technology is powering digital assistants on phones, cars, and smart speakers in our homes. But despite their ubiquity, speech recognition is still in development. 

Now Facebook announced a major breakthrough in the speech recognition sector. The company is developing an AI speech recognition system that learns without any human input.

Modern speech recognition systems are trained on audio recordings of conversations and their text transcripts. These transcripts are handwritten by humans. This is a long and boring work since training artificial intelligence requires a huge amount of educational material. 

Facebook AI speech recognition system called Wav2vec-U (Wav2vec Unsupervised) only needs to “feed” speech samples in the desired language and fragments of random text in it, after which it will start learning on its own until it “understands” individual words and phrases.

The Facebook model is essentially based on a feedback loop between a contradictory generative network (GAN) composed of a “generator” and a “discriminator.” The first takes out representations of the speech patterns uploaded to the corresponding network, which acts as a kind of translator.

At the same time, Facebook introduces additional text written by people to help the generator understand the difference between computerized and real-world results. This process is repeated until the generator solution matches the actual text.

Facebook engineers have successfully taught Wav2vec-U to recognize speech in Swahili, Kyrgyz, and Crimean Tatar languages. The system generates 63% fewer errors than the previous system of the same type, and it only took 9.6 hours of speech and 3000 written phrases for training. To speed up the development of Wav2vec-U, the company has released the system code on Github.

Companies Amazon, Otter.ai, Google, Deepgram, Microsoft, Verbit, offer their own speech recognition systems, but they all require human participation for training, unlike the new Facebook system.

Meet Vishak, TechLog360's Content Editor and tech enthusiast. With a Computer Science degree and a passion for all things tech, Vishak delivers the latest in hardware, apps, and games with expertise. Trusted for his in-depth reviews and industry insights, he's your guide to the digital world. Off-duty, he's exploring photography and virtual gaming landscapes.


Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.

More from this stream