As far too many people have pointed out already, Apple’s event this morning was a bit of a disappointment — as a spectacle. 16 months is a long time to wait for an incremental hardware improvement, which the iPhone 4S unquestionably is. But there was one announcement amid all the recap of iOS 5 and iCloud features that should have been tremendously exciting to anyone who cares about the future of interactions: Siri, the voice-activated assistant exclusive to the iPhone 4S.
As always happens when Apple rolls out a major technology (in this case, e-concierge services), critics are happy to point out that Cupertino is late to the party (can you believe that it took them 5+ years to respond to the Treo?!). Specifically, they’re calling Siri a catch-up effort to match Google Voice Actions technology that’s been available on Android for well over a year. Having used Voice Actions for awhile now, I can confirm that this is half-true. On a feature-by-feature basis, Siri looks me-too. But from an experience standpoint, it’s totally different. As usual, Google’s implementation is process-oriented. Apple’s, unsurprisingly, is human and friendly. And this is why Siri has the potential to be revolutionary.
What do you notice? Superficially, the biggest difference is that Google emphasizes that what you’re talking to is a piece of technology. Your commands have a unified structure they need to follow. For example, you would say, “Call <name of user as written in address book> at <location>” in order to place a phone call to anyone, regardless of who that person is to you. All people are treated the safe. Siri, on the other hand, makes things as personal as possible. Depending on the person’s relationship to you, you can use more friendly language, like “Call my wife.” It feels a lot more like talking to a person, which makes the conversation both more natural and more magical.
Or take something significantly more straight-ahead and functional: weather. On Google Voice Actions, you talk like you’re dictating a Google search: “weather San Francisco Oct. 7”. And then it displays the google search results that appear on a web page. That’s about it. On Siri, you say, “What’s the weather going to be like when I’m in San Francisco? How about Napa.” And the app responds in fairly natural form while also displaying the information it’s talking about in visual form.
From a functional standpoint, this is identical. From an experiential standpoint, it’s worlds apart. Google wants to turn your voice into a keyboard and mouse. Apple wants you to have a different kind of relationship to your phone that looks a lot like your relationships to people.
There is good reason for this. Virtually anyone who might use Siri has experience with (terrible) voice recognition software, including Apple Voice Control, the inept tool included in iOS 3 that was basically Google Voice Actions except wildly inaccurate (especially for recognizing Indian names) and capable only of placing phone calls and playing back music. I think it’s safe to say that most of our experiences look a lot like this. Apple has chosen to make Siri respond less like typical voice recognition and more like a helpful human assistant in part to avoid some of our negative associations with the underlying technology.
It’s a good approach. Tellme, the startup that Mike McCue founded prior to Flipboard and a Microsoft technology that seems at least partly responsible for the voice side of Kinect’s magic, used human-sounding voice interfaces with pauses and complete sentences to make customer service lines way less terrible. That kind of power can now be processed on a phone in near real time.
That’s what makes me so excited about Siri — it isn’t set up to be an improvement over the crummy consumer voice recognition that we’ve all been exposed to over the last decade. It’s taking technology and intelligence that previously required brawny hardware and putting it in the palms of our hands. It’s the Knowledge Navigator. Finally.