Google Voice Has New Acoustic Technology For Better Search Recognition Even In Noise

Google has been evolving with new technologies not just in Manufacturing things like Google Glass and Android things. They are also improving their products such as Google Voice search with better systems to analyse and predict or recognise the voice commands in various environments.

ok-google-voice-search-improve-00

One such improvement has lodged into Google Voice search and that is Connectionist Temporal Classification (CTC) and sequence discriminative training techniques, which are extensions of the recurrent neutral networks (RNN) used for voice recognition. Here’s what Google said about the system they just started with:

RNNs have feedback loops in their topology, allowing them to model temporal dependencies: when the user speaks /u/ in the previous example, their articulatory apparatus is coming from a /j/ sound and from an /m/ sound before. Try saying it out loud – “museum” – it flows very naturally in one breath, and RNNs can capture that. The type of RNN used here is a Long Short-Term Memory (LSTM) RNN which, through memory cells and a sophisticated gating mechanism, memorizes information better than other RNNs. Adopting such models already improved the quality of our recognizer significantly. 

Here’s what Google was using after 2012:

In a traditional speech recognizer, the waveform spoken by a user is split into small consecutive slices or “frames” of 10 milliseconds of audio. Each frame is analyzed for its frequency content, and the resulting feature vector is passed through an acoustic model… The recognizer then reconciles all this information to determine the sentence the user is speaking. If the user speaks the word “museum” for example – /m j u z i @ m/ in phonetic notation – it may be hard to tell where the /j/ sound ends and where the /u/ starts, but in truth the recognizer doesn’t care where exactly that transition happens: All it cares about is that these sounds were spoken.

You can test Google’s upgraded digital ears on both Android and iOS now.

Simranpal Singh
Simranpal Singh
With a decade-long journey in the tech industry, I've been actively engaged in tech reporting across various reputable publications. He currently works as a Web Developer at RightNode Media and pursues his hobby of writing on GoAndroid. Enjoy travelling, and always excited about new tech trends. He actively contributes on GizmoChina and GChromecast Hub.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

You Might Like

Thing you should check

How To

Best Of