In December 2020, Google announced ‘Look To Speak‘, an application designed to help people with disabilities speak through their eyes. Now, after two years, as the service has advanced and evolved a lot — Google is planning to integrate it within Google Assistant products.
In future, you longer have to say ‘Ok Google’ to trigger Google Assistant; instead, it is able to activate automatically when it detects our gaze.
Google wants Assistant to be as human-like and natural in its interaction as possible, including being able to start talking to you when you make eye contact. To achieve this, they announced ‘Look and Talk’ at Google I/O 2022, now explaining that it is the first time the device has simultaneously analyzed audio, video and text.
The feature currently works only on the Nest Hub Max and in English. With this new technology, activation would be done by looking directly at the device. The Google Nest Hub, for example, will detect the attitude and expression of the person in front of the screen who wants to start a conversation. For this, the orientation of the head, the gaze and even the direction of the subject’s body is analyzed.
As the Google AI blog notes, “Using eight machine learning models together, the algorithm can differentiate intentional interactions from passing glances in order to accurately identify a user’s intent to engage with Assistant. Once within 5ft of the device, the user may simply look at the screen and talk to start interacting with the Assistant.”
This information is collected on video, so the device must have a camera and be placed in an optimal position to see the room. Added to this is the analysis of the audio collected by the microphones. The audio should determine that the person is talking to Google and not someone else in the room.
In addition, the Assistant also confirms the audio input is tied to Google’s Voice Match, so the Assistant won’t interact with anyone whose voice it doesn’t recognize. The tone of voice, the speed and some contextual signals are analyzed to understand whether or not the user wants to make a query.
It is quite an advance that the smart speakers that are beginning to be seen in every house can not only understand sentences or complete conversations out loud but also analyze gestures and non-verbal communication.
Although this novelty is currently limited to a small number of devices and only supports conversations in English. Later Google will expand its advances to other languages and regions. Google Assistant currently works in 95 countries and recognizes more than 29 languages.