More goes into 'Hey Siri' than you might think

Having your iPhone respond to “Hey Siri” seems like such a simple thing, but it’s actually quite complicated. Recognizing this code phrase, and the person who said it, is critical for Apple speech-recognition system.

A post in Apple’s Machine Learning Journal just published today describes many of the challenges developers overcame to make this work.

One of the complications is that recognizing “Hey Siri” has to happen on the iPhone or iPad. Most of Siri’s speech recognition is done by uploading the user’s words to a remote server, but that only begins after the “Hey Siri” phrase has been recognized by the phone. Apple’s commitment to privacy prevents the iPhone from sending everything it hears to a server.

Every phone and most Apple tablets since the iPhone 6s have had a low-power and always-on processor that continuously listens for the key phrase “Hey Siri”. That’s all this chip does. This voice recognition processor uses a neural network that mimics the layout of a living brain.

The Machine Learning Journal article is only about “Hey Siri” because all the rest of Siri’s speech recognition is done on servers. That’s an entirely different process. And one that has a whole raft of problems. Still, Apple is on a hiring spree to fix them.

Why “Hey Siri”?

Apple picked its key phrase because it’s short and easy to say. The Siri voice recognition system debuted on the iPhone 4S several years before, but required pushing the Home button to activate. According to Apple, many people started their requests with “Hey Siri” even before this phrase had a role.

The down side is that this key phase resembles many other phrases, such as “are you serious?”. The iPhone’s dedicated processor also has to deal with all the other people chattering nearby, some of whom might be talking to their own iPhones.

According to today’s Machine Learning Journal article, the chip first picks the phrase “Hey Siri” out of what it hears, then it it checks if the phrase was said by the person it was trained to listen for.

The processor turns the audio into a 13-dimensional vector to recognize that someone said “Hey Siri”. It then converts the audio into a 442-dimension vector to see whether the correct speaker uttered the key phrase.

Apple posted the details of how it picks the all-important phrase out of the air in a Machine Learning Journal article in October. The newest post discusses how the neural chip learns to recognize its owner.

Training “Hey Siri”

Everyone remembers that they had to train their iPhone to recognize their own voice by saying “Hey Siri” several times. This is called explicit enrollment.

What very few people realize is that the system continues to learn what their voice sounds like after the training session. This is because the session is almost always done under ideal conditions, while the iPhone has to learn to recognize “Hey Siri” with all kinds of ambient noise. For some time after training officially ends, every use of “Hey Siri” is being used to learn more.

So try to avoid letting other people say “Hey Siri” near your iPhone while it’s still learning your voice.

Privacy matters

Apple set itself a difficult task when it decided to do voice recognition directly on a smartphone. But the alternative was to send recordings of everything said near the iPhone to a remote server to recognize the key phrase. Apple wasn’t going to turn its devices into spies.

Of course, that didn’t bother Amazon. That’s exactly how its Echo devices do all their speech recognition.

Ed Hardy

Ed Hardy has been writing full-time about tech for 26 years, and using it for much longer than that. His intro to Apple was a Macintosh SE/30 (which he still has), but now he uses a 13-inch iPad Pro as his primary computer.

That’s because he’s a “tablet first” type of guy. Rather than use a Macbook, he connects a keyboard case to the iPad. And instead of a desktop Mac, he connects his tablet to a 27-inch display and full-size keyboard. (So don’t try to tell him that everyone has to use a Mac to be productive.)

Before coming to Cult of Mac, Ed wrote for NotebookReview, TabletPCReview and Brighthand, as well as other sites.

More goes into ‘Hey Siri’ than you might think

Why “Hey Siri”?

Training “Hey Siri”

Privacy matters

Comments Cancel reply

Subscribe to the Newsletter

Popular This Week

Your iPhone could soon be spotted by license plate cameras

Tons of new features in Photos help sort your massive library in iOS 27

What does Siri AI mean for your old HomePod?

New iPhone and Mac updates offer dozens of security fixes

Most beautiful Mac setups: Symmetry, style and serious desk envy

Ugreen’s latest charger and power banks prove smaller is better [Review]

4 features that supercharge Notes in iOS 27 and macOS Golden Gate

Still using an Intel Mac? Here are 6 reasons it’s time to upgrade.

Why Apple silicon Macs aren’t immune to RAM price hikes

Leaker warns coming Apple Watch redesign will make your bands obsolete