The next few years will see voice automation take over many aspects of our lives. Although voice won’t change everything, it will be part of a movement that heralds a new way to think about our relationship with devices, screens, our data and interactions.
We will become more task-specific and less program-oriented. We will think less about items and more about the collective experience of the device ecosystem they are part of. We will enjoy the experiences they make possible, not the specifications they celebrate.
In the new world I hope we relinquish our role from the slaves we are today to be being back in control.
Voice won’t kill anything
The standard way that technology arrives is to augment more than replace. TV didn’t kill the radio. VHS and then streamed movies didn’t kill the cinema. The microwave didn’t destroy the cooker.
Voice more than anything else is a way for people to get outputs from and give inputs into machines; it is a type of user interface. With UI design we’ve had the era of punch cards in the 1940s, keyboards from the 1960s, the computer mouse from the 1970s and the touchscreen from the 2000s.
All four of these mechanisms are around today and, with the exception of the punch card, we freely move between all input types based on context. Touchscreens are terrible in cars and on gym equipment, but they are great at making tactile applications. Computer mice are great to point and click. Each input does very different things brilliantly and badly. We have learned to know what is the best use for each.
Voice will not kill brands, it won’t hurt keyboard sales or touchscreen devices — it will become an additional way to do stuff; it is incremental, not cannibalistic.
We need to design around it
Nobody wanted the computer mouse before it was invented. In fact, many were perplexed by it because it made no sense in the previous era, where we used command lines, not visual icons, to navigate. Working with Nokia on touchscreens before the iPhone, the user experience sucked because the operating system wasn’t designed for touch. 3D Touch still remains pathetic because few software designers got excited by it and built for it.
What is exciting about voice is not using ways to add voice interaction to current systems, but considering new applications/interactions/use cases we’ve never seen.
At the moment, the burden is on us to fit around the limitations of voice, rather than have voice work around our needs.
A great new facade
Have you ever noticed that most company desktop websites are their worst digital interface; their mobile site is likely better and the mobile app will be best. Most airline or hotel or bank apps don’t offer pared-down experiences (like was once the case), but their very fastest, slickest experience with the greatest functionality. What tends to happen is that new things get new cap ex, the best people and the most ability to bring change.
However, most digital interfaces are still designed around the silos, workflows and structures of the company that made them. Banks may offer eight different ways to send money to someone or something based around their departments; hotel chains may ask you to navigate by their brand of hotel, not by location.
The reality is that people are task-oriented, not process-oriented. They want an outcome and don’t care how. Do I give a crap if it’s Amazon Grocery or Amazon Fresh or Amazon Marketplace? Not one bit. Voice allows companies to build a new interface on top of the legacy crap they’ve inherited. I get to “send money to Jane today,” not press 10 buttons around their org chart.
It requires rethinking
The first time I showed my parents a mouse and told them to double-click on it I thought they were having a fit on it. The cursor would move in jerks and often get lost. The same dismay and disdain I once had for them, I now feel every time I try to use voice. I have to reprogram my brain to think about information in a new way and to reconsider how my brain works. While this will happen, it will take time.
What gets interesting is what happens to the 8-year-olds who grow up thinking of voice first, what happens when developing nations embrace tablets with voice not desktop PCs to educate. When people grow up with something, their native understanding of what it means and what it makes possible changes. It’s going to be fascinating to see what becomes of this canvas.
Voice as a connective layer
We keep being dumb and thinking of voice as being the way to interact with “a” machine and not as a glue between all machines. Voice is an inherently crap way to get outputs; if a picture states a thousand words, how long will it take to buy a t-shirt. The real value of voice is as a user interface across all devices. Advertising in magazines should offer voice commands to find out more. You should be able to yell at the Netflix carousel, or at TV ads to add products to your shopping list. Voice won’t be how we “do” entire things, it will be how we trigger or finish things.
We’ve only ever assumed we talked to devices first. Do I really want to remember the command for turning on lights in the home and utter six words to make it happen? Do I want to always be asking. Assuming devices are select in when they speak first, it’s fun to see what happens when voice is proactive. Imagine the possibilities:
- “Welcome home, would you like me to select evening lighting?”
- “You’re running late for a meeting, should I order an Uber to take you there?”
- “Your normal Citi Bike station has no bikes right now.”
- “While it looks sunny now, it’s going to rain later.”
While many think we don’t want to share personal information, there are ample signs that if we get something in return, we trust the company and there is transparency, it’s OK. Voice will not develop alone, it will progress alongside Google suggesting emails replies, Amazon suggesting things to buy, Siri contextually suggesting apps to use. We will slowly become used to the idea of outsourcing our thinking and decisions somewhat to machines.
We’ve already outsourced a lot; we can’t remember phone numbers, addresses, birthdays — we even rely on images to jar our recollection of experiences, so it’s natural we’ll outsource some decisions.
The medium-term future in my eyes is one where we allow more data to be used to automate the mundane. Many think that voice is asking Alexa to order Duracell batteries, but it’s more likely to be never thinking about batteries or laundry detergent or other low consideration items again nor the subscriptions to be replenished.
There is an expression that a computer should never ask a question for which it can reasonably deduce the answer itself. When a technology is really here we don’t see, notice or think about it. The next few years will see voice automation take over many more aspects of our lives. The future of voice may be some long sentences and some smart commands, but mostly perhaps it’s simply grunts of yes.