A Message About Speech

It’s clear that speech recognition is moving into the mainstream of society at a rapid pace. After years of false starts and frustrations, Siri, Google Now, Cortana and Amazon’s Echo are becoming a preferred interface for millions of people.

It’s not difficult to understand the appeal. Voice is a wormhole through cumbersome and often frustrating processes. That’s why people talk in the first place.

I often forget how convenient it is to use Siri on my iPhone, and I revert to hunting and pecking for things. Yes, speech technology has a way to go before it becomes truly stellar, but it is incredibly convenient for many tasks.

Apple is reportedly extending Siri functionality to Macs in its upcoming MacOS Sierra release. The speech functionality it has added to the Apple TV is a giant step forward. Meanwhile, all the major players are tapping AI to create a more contextual framework that learns a person’s habits over time in order to deliver more targeted and accurate results.

Ideally, I’d be able to bark commands for home automation and an array of other tasks using Siri, Echo and other speech recognition systems. Although these capabilities theoretically exist today through the various platforms, the reality is that all the ecosystems, protocols and technologies are not compatible. It’s essentially 1991 in email years.

Of course, one of the goals with Echo is to sell more stuff through Amazon. The company boasts that it allows Prime members to purchase “tens of millions” of items using the device. This is in addition to ordering an Uber car, checking on your bank balance and playing Pandora.

Business and IT leaders should pay close attention to this space. The reality is that today’s leading-edge apps will soon be tomorrow’s digital debris.

Retailers should be adding voice search into apps so that a consumer can simply say, “Show me sheets for California King beds” or “Display non-stick frying pans.” Apps such as OpenTable and Ticketmaster should allow people to ask, “What nearby Italian restaurant is available for a party of two on Friday night at 7 p.m.?” or What concerts and shows are available this weekend?”

We’ll get there—and it will be sooner rather than later. We’re still in the early stages of voice interfaces. Yet, over the next few years, the message will come through loud and clear: Voice tools must be embedded in just about every system that people use.