Wearable smart camera understands silent voice commands | Cult of Mac

Wearable smart camera understands silent voice commands


The wearable infrared camera can see what you're saying.
The wearable infrared camera can see what you're saying.
Photo: Cheng Zhang@Cornell University

Ever been in a situation where your hands aren’t free but you can’t speak out loud to Siri voice assistant, either? There could be a camera coming that understands your voice commands even if you’re not making a sound.

Two Cornell University researchers created a wearable infrared smart camera that detects voice commands not by sound, but by measuring movements in the neck and face from under the chin.

Wearable smart camera with silent speech detection

The two researchers dubbed the wearable camera “SpeeChin.” They’re Cheng Zhang, assistant professor of information science in the Cornell Ann S. Bowers College of Computing and Information Science, and Cornell doctoral student Ruidong Zhang.

They said it’s the first necklace-based silent-speech recognition device that can detect 54 silent speech commands in English and 44 in Chinese.

“Imagine when your hands are occupied or you simply don’t want to reach out to your smart devices to interact with them, you might want to use voice control,” Assistant Professor Zhang said. “However, if you are in a noisy place or in a meeting, voice control is not efficient or socially appropriate. This is where silent speech comes into place.”

SpeeChin’s neck-mounted infrared camera captures the movement of the chin from below. Even with no audible sound, that lets it determine words spoken.

Though more subtle worn around the neck than in a forward-mounted placement near the speaker’s face, the camera should not arouse privacy concerns because it sits at an angle where it can’t capture other people’s faces.

High reliability, but only under certain conditions

Gizmodo reported that the researchers tested SpeeChin with 20 participants. Ten spoke 54 simple phrases, including numbers and common voice-assistant commands, in English. The other 10 spoke 44 simple words and phrases in Mandarin Chinese. After being “trained,” the camera could recognize commands in English with 90.5% accuracy and in Chinese with 91.6% accuracy.

But the camera only got those high marks when participants sat still. When they moved, recognition reliability fell with variations in walking gait and head movement.

That would seem to reduce the number of places one could reliably use the SpeeChin device. That is, unless improvements are made, like longer training sessions incorporating movement or, perhaps, more advanced camera equipment with higher resolution and higher frame rates for more detailed detection.