Despite the continued development of voice assistants such as Alexa, Siri, Google Assistant, and Cortana, no company appears to have considered how awkward it can be to issue commands in public. Before now.
The awkwardness of using phone-based voice assistants in public is one of their most fundamental flaws. As frequently as I use my Harman/Kardon Invoke at home to control smart lights, get calendar information, and create Cortana reminders, I rarely use Cortana on my Android phone for one simple reason: it’s just a bit… strange, at least in public. Microsoft appears to concur, as the company has patented a module capable of detecting “silent” voice commands.
The “silent” input method, as described in the company’s patent application, can detect whispers and extrapolate voice commands from the airflow created while mouthing words. The module is compatible with a variety of devices, such as smartwatches, phones, a smart “ring,” regular headset microphones, and even a TV remote.
As usual, keep in mind that patents do not necessarily translate into products, but there have been recent rumors that Microsoft is still considering Cortana-focused hardware. We must simply wait and see.
Although the performance of voice input has vastly improved, it is still rarely used in public spaces, including offices and even homes. This is primarily due to the fact that voice leakage in a quiet environment could disturb and even irritate nearby individuals. On the other hand, there is still the possibility of private information reaching unintended recipients. These are social issues, not technical ones. Even if the performance of voice recognition systems is greatly enhanced, there is no simple solution. The implementations of the subject matter described in this document provide an undetectable voice input solution. Unlike conventional voice input solutions, which are based on normal speech or whispering and employ egressive (breathing-out) airflow while speaking, the proposed “silent” voice input method employs the opposite (ingressive or breathing-in) airflow while speaking. By placing the apparatus (e.g. microphone) of the apparatus very close to the user’s mouth with a small gap between the mouth and the apparatus, the proposed silent voice input solution can capture stable utterance signal with a very small voice leakage, allowing the user to use ultra-low volume speech input in public and mobile situations without disturbing nearby people. Aside from the direction of air flow (ingressive and egressive), all other utterance styles are identical to our own, so the proposed method can be used without special training.