Sensory Boosts Performance of Embedded Wake Word and Speech Recognition by Infusing Smarter AI

Sensory's TrulyHandsfree with new shallow learning technique stays small and gets smarter, more accurate, and more robust to noise

News provided by

Sensory

Apr 27, 2017, 08:30 ET

SANTA CLARA, Calif., April 27, 2017 /PRNewswire/ -- Sensory, a Silicon Valley-based company focused on improving the user experience and security of consumer electronics through state-of-the-art embedded AI technologies, today announced that it has made significant updates to the embedded AI in its TrulyHandsfree™ technology to dramatically boost its performance and accuracy, while staying small and low power.

Introduced in 2009, TrulyHandsfree revolutionized voice user interfaces by offering the first commercially successful embedded small vocabulary speech recognition system to feature an always-listening wake word. Incorporating Sensory's smartest and most efficient deep neural network technologies to date TrulyHandsfree 5.0 takes embedded voice interfaces to new heights, offering an on-device voice user interface experience that is more natural and intuitive than ever before yet a new shallow learning approach compresses the model sizes down to run in ultra-low power and with minimal memory and MIPS. Today, TrulyHandsfree can be found in leading mobile phones, sports cameras, IoT devices, and even toys!

Smarter Speech Activation for Improved Accuracy

At the beginning, accuracy concerns were the major limiting factor that prevented mass adoption of voice wakeup technology. The risk of false fires had to be minimized to ensure that devices didn't mistakenly activate at inappropriate times. TrulyHandsfree was the first solution capable of offering this consistent reliability, and since its introduction into products like the MotoX, and Galaxy S series smartphones, Sensory's voice models and neural networks have continually evolved to offer better performance. Today, Sensory's latest deep neural network models for embedded AI have allowed the company to deliver a 5X reduction in false accepts compared to version 4.0¹, nearly eliminating the chances of the speech recognition system activating when not actually summoned by the user. A new shallow learning approach takes the biggest speech models and compresses them down by a factor of 5-10 with no decrease in accuracy. Additionally, the latest neural network models offer greater reliability for user-defined triggers, providing the option for users to select the wake word they prefer, while still having the same accuracy and performance offered with specialized fixed triggers.

Enhanced Security Makes Sure That It's You Speaking

One of the greatest challenges facing the IoT industry is user and data security. TrulyHandsfree 5.0 includes a layer of security in the voice interface that utilizes Sensory's expertise in voice biometrics recognition and combines it with deep neural nets to authenticate users, limiting who can access it. TrulyHandsfree 5.0's embedded speaker verification technology is highly flexible, allowing users to enroll their voice and their own custom trigger or passphrase, restricting unauthorized users from accessing the voice user interface. Even if an unauthorized person learns the trigger or passphrase, Sensory's voice biometrics technology will recognize that it's not the enrolled user speaking and not authenticate them, preventing them from accessing the device.

Advanced Signal Processing for Voice Barge-In and Far-Field Speech Recognition

TrulyHandsfree 5.0 also features a new voice barge-in feature, enabled with Sensory's proprietary Acoustic Echo Cancellation (AEC) technology. Users can interrupt devices while playing voice prompts, music or other sounds by saying the trigger phrase to control music playback by voice, or provide any other kind of supported speech commands. This provides a more fluid voice user interface experience. Sensory's new AEC technology is tuned specifically to maximize speech recognition system accuracy. This not only boosts the performance of the embedded TrulyHandsfree speech recognizer, but also any cloud-based speech recognition system that the speech requests are passed to.

Further, the overall performance of voice user interface systems is greatly affected by the signal-to-noise ratio of the audio signal received. Previous versions of TrulyHandsfree boasted excellent robustness to noise, however with version 5.0, Sensory incorporates new deep learning noise suppression algorithms that reduce the level of ambient noise provided to the speech recognizer to ensure that wake words and voice requests are heard clearly, further improving TrulyHandsfree's recognition hit rate. This is especially helpful in home, automotive and mobile applications where background noise can overshadow the volume of the user's voice.

Same Low-Power and Efficient Footprint

Today, voice has surpassed all other interface options for a growing list of device categories, however, most devices on the market today rely on cloud services for AI processing. Yet, these cloud-based solutions cannot be accessed completely hands-free without a client-side voice trigger technology. Many of today's always-listening voice-enabled device applications, especially low-power devices that don't have the required resources to run completely off the cloud, can benefit from a hybrid client/cloud approach that taps TrulyHandsfree technology. TrulyHandsfree is extremely resource- and power-efficient with ports available for today's most powerful applications processors to low-power DSP platforms. For ultra-low power devices that have limited battery capacity such as wearables, Sensory offers its Low Power Sound Detector (LPSD) hardware component for DSPs and smart microphones that can reduce low-power configurations of TrulyHandsfree to operate at an average battery draw of less than a 1mA.

"The demand for voice user interfaces continues to grow rapidly and TrulyHandsfree 5.0 will allow more manufacturers to incorporate low cost, low power voice user interfaces on device without sacrificing the cloud accuracy," said Todd Mozer, CEO of Sensory. "TrulyHandsfree 5.0 offers the most advanced and efficient embedded AI technologies we've ever created. Additionally, we've set the bar higher than ever before for speech recognition accuracy by applying our new proprietary echo cancellation and noise reduction algorithms that we are confident will boost far-field voice performance for IoT devices of all kinds."

TrulyHandsfree is the most widely deployed embedded speech recognition engine in the world, having enabled a hands-free voice user experience on more than 2 billion devices from leading brands worldwide. Additionally, Sensory can deliver voice triggers for all major IoT cloud services, including Amazon AVS, Apple Siri, Google Assistant and Microsoft Cortana, and provide developer support for cloud service interfaces on Linux, Android, iOS and Windows as well as support for dozens of proprietary DSPs, microcontrollers, smart microphones and other low-power embedded devices.

For more information about this announcement, Sensory or its technologies, please contact [email protected]; Press inquiries: [email protected].

About Sensory
Sensory Inc. creates a safer and superior UX through vision and voice technologies. Sensory's technologies are widely deployed in consumer electronics applications including mobile phones, automotive, wearables, toys, IoT and various home electronics. Sensory's product line includes TrulyHandsfree voice control, TrulySecure biometric authentication, and TrulyNatural large vocabulary natural language embedded speech recognition. Sensory's technologies have shipped in over a billion units of leading consumer products. Visit Sensory at www.sensory.com

TrulyHandsfree is a trademark of Sensory Inc.

Appendix:
1: Offers at least 5X lower false accept rate for a typical robust false reject setting when compared to version 4.0 of TrulyHandsfree.

SOURCE Sensory