Updated: Feb 10, 2021
Voice assistants and smart speakers: battling for the smart home
There appears to be no escape from smart speakers. Some forecasts predict global market growth from $7.5B in 2020 to $32B in 2025. In early 2020, more than one-third of US adults had a smart speaker in their home, and about 50% say they are using it daily. While they can be useful for simple actions, their support with major household tasks such as cooking or laundry is still quite limited.
Download the Full Report Voice assistants are the cloud services that receive voice commands from smart speakers and other devices, turn speech to text, interpret the resulting language, detect the intent of the command, and use “skills” to determine appropriate actions. These actions are executed either by the smart speakers or through operations of other devices in the home that are connected to the Internet. Currently, Amazon’s Alexa has the voice assistant market leadership, with Google Assistant catching up rapidly, and Apple’s Siri expanding as well. While each voice assistant requires hardware for the speakers and the microphones, the services they convey are the real value for their companies’ business models. Amazon enables purchases from their commerce service, Google favors access to their services and uses the data for advertising, and Apple uses Siri to support its ecosystem of hardware and services. Each of the big three voice assistant providers wants to connect to a large ecosystem of connected devices, controlled through their voice assistants. Those ecosystems are expanding rapidly trying to link all connectable devices in homes. 3rd party speaker providers, for instance, B&O, Sonos, and Harman Kardon are offering their own smart speakers, using voice assistant services offered by one or more of the three market leaders. Amazon is promoting its Alexa Voice Services (AVS) to manufacturers to create Alexa Built-in products. This would allow devices to connect directly to Alexa, without separate speakers as intermediaries. In another market development, smart speakers are combined with displays and are positioned as multi-function home hubs. Consumers must choose device to which voice assistant ecosystem they should subscribe. While Amazon Alexa and Google Assistant have a dominant lead in smart speaker-based voice assistants, Apple's Siri is used widely with smartphones, tablets, and computers. This decision becomes even more important with the introduction of voice assistant routines. These routines, or short cuts, can invoke a series of commands to multiple devices. Naturally, for this to work, all devices involved must support the same voice assistant ecosystem. This creates a lock-in effect, leading to the increased market power of the leading assistant services, and to reduced competition.
But are people using them to control smart appliances?
So far consumers appear to use their smart speakers mostly for entertainment and information services, with the responses conveyed by the associated smart speakers. Control of smart home devices does not yet appear to be a major factor in voice assistant use. However, the dramatic increase of voice assistant adoption requires that manufacturers of home appliances support all major voice assistants. Otherwise, they would risk someone not considering their brand(s). To simplify the use of their appliances, some manufacturers are looking to use built-in microphones and speakers to connect them directly to voice assistant services in the cloud, eliminating the need for smart speakers. However, this trend is nascent, with no significant amount of use yet.
Beyond the humble thermostat or room lighting, manufacturers are looking to connect the balance of the kitchen — from refrigerators to ranges with Internet connectivity. However, our informal survey shows that only a small fraction of consumers actually connect their connectable appliances to the Internet. Fewer still enable voice assistant support and use it regularly. Whether for value or security concerns, there is little momentum for consumers to change the way they operate their homes by connecting. The desire to connect appliances directly to voice services has its own challenges. If appliance developers use dedicated voice chips for appliances, e.g. for Amazon Alexa or for Google Assistant, would they need a chip for each voice assistant to which they want to connect?
What is different about controlling major appliances?
When voice assistants are used with consumer devices, simple single-action commands are most popular. For instance, changing the temperature in a townhouse where thermostats on several floors can respond to a single voice command saves the user going up and down the stairs. This is clearly a convenience. Routines or short cuts can be used to string some of these commands together, still resulting in simple actions.
However, the usage context of major appliances is quite different.
Controlling a range as part of cooking a meal requires far more complex interactions than setting the temperature for a room or even an entire apartment.
Major appliances are used for household tasks that generally quite complex. They often involve many steps at different times, multiple appliances, and manual actions.
The usage scenarios are highly individual, differing widely between households, and even within households depending on the context.
Manufacturers are looking to have a single app for all their appliances. Dealing with broader household tasks would require that consumers buy all their appliances from a single brand. This may work in some markets such as rental apartments or new developments, but not in many others.
The disconnect between what consumers want and what manufacturers think they want in connected experiences was made clear in a report by IBM’s Institute for Business Value. They conducted a survey of manufacturers’ executives and consumers, ranking the motivations for digital consumer experiences.
Executives might call a time out and rethink whether they deliver what customers really want: more time, more convenience, with faster results and easier processes. Where does a speaker — or a voice command fit in? How — and who — does it help? Technical perfection is not sufficient for broad success with consumer solutions. There is also a social component to driving success. Main stream users have to agree on what the problem is, and that the proposed solution solves this problem. This requires patient experimentation and communication.
Voice assistants have some inherent limitations
In many ways, voice assistants simply replace button pushes with voice commands. It is still up to consumers to consider the context of an action. When multiple devices are involved in achieving a high-level objective such as cooking a meal, users have to orchestrate the device actions with each other and with other activities required to reach the goal. The assistant isn’t smart, and it’s arguable whether telling a speaker to raise the oven temperature is overly helpful. Consider these points
Voice assistants use single commands. For now, these consist mostly of fixed phrases. Effectively, they push one button or set one dial. Routines may string multiple buttons pushed together at one time.
As more flexible natural language understanding technologies are applied, interpretations of speech commands may become ambiguous. With commands resulting in actions, misunderstandings can be risky. Did I really want to set the oven to 600 degrees? Do we need "guard rails"?
Voice assistants support only one-way “conversations”. The appliances cannot talk back, asking for clarification of intent. Building checks into the skills executed in the cloud does not solve this problem.
The commands are independent of the state of the device. The user has to know whether an oven is on when the heat should be turned lower, etc.
The stateless aspect of the voice commands also limits the ability to support action sequences if those actions depend on the state of the device. Have I turned on the exhaust before I turn on a burner on the stove?
Appliances generally cannot initiate conversations, or give alerts by saying, for instance, that the clothes washer is finished, or that the pot on the stove top is boiling over.
In many cases, only a subset of the appliance functionality is accessible via voice assistant. This can be due to safety reasons. A stovetop burner should be turned on only when an adult is in the kitchen. Some functions are complex and depend on the state external to the appliance itself. An example is a simple action to bring water to a boil and cook pasta until it’s tender.
Voice assistants cannot integrate context data, such as who is in the kitchen? Is there milk in the refrigerator?
Voice assistants typically do not remember history — how did we do this the last time? (And given the privacy implications, would we like them to remember history?)
Voice assistants depend on an Internet connection. As Internet connections are less than fully reliable, consumers cannot rely on voice assistance to complete tasks.
These shortcomings limit voice assistants from elevating their use from transactional to really helpful. They need a semantic level of interaction to support more complex household tasks. This includes the situational context of the action. For instance, sometimes we want to prepare a meal as quickly as possible, with the least amount of work. Other times, we may experiment with recipes and enjoy the cooking process. Today, voice assistant actions cannot be tailored to handle specific situations in a household, such as considering who is home for dinner, or what ingredients are at hand. The lack of history inhibits the ability to learn about consumers’ preferences. And it prevents the automation of actions, such that they require no voice input or other input at all — making them more universal. Household tasks are highly personal. How do we personalize actions? AI is not yet up to the task and involves other risks. Expecting consumers other than early adopters to program complex tasks would require revolutionary improvements in user interfaces.
Conversely, some simple tasks such as managing lights or temperature in a room might no longer require any voice commands They could become completely automated simply by human presence, without a single word spoken/heard/misheard/misinterpreted.
Safety, Security, and Privacy must be considered as well!
Privacy considerations are becoming an increasing barrier to consumers’ adoption of voice assistants. The voice data are transmitted to the cloud for analysis, often remaining there indefinitely, stored permanently. While in general, only voice commands issued after the wake word are sent into the cloud, some mishaps illustrate serious risks. The associated software is inherently complex, occasional failures are to be expected. However, even in the absence of software failures, voice assistant data, in particular when combined with other data, creates a potentially serious privacy exposure; it’s an extremely detailed view of people’s home lives. Both Google and Amazon keep a copy of every voice command you send to their smart speakers in the cloud. If somebody gets a hold of phones associated with the account, they can see all conversations. The speakers may be listening all the time, not just when woken up with their wake word. Sometimes this occurs by accident and sometimes intentionally, for instance when using smart speakers for acoustic security surveillance. There are sayings about not airing dirty laundry on social media; however, voice assistants may be much more intrusive. This could lead to a backlash in the use of connected devices. In fact, privacy concerns have jumped up to the second rank for reason consumers don't have smart speakers.
For now, we are just at the beginning of the journey towards mitigating privacy risks. Voice assistant privacy risks are just part of the larger public conversation about consumer privacy that appears to be gathering momentum. A great part of the solution will have to come from regulations such as the GDPR legislation in Europe. Fines are starting to be levied and Google was among the first and the largest. However, there also can be technical solutions that mitigate the risks and give consumers more control over their data. A recent proposal using blockchain technology to protect consumer data is an example. Also consider one last point: physical safety. Voice commands to devices are often "physical" requests… start an appliance or open a door. This also opens the potential for serious safety and security risks in addition to privacy. While the potential damage in-home use is limited to that home, these risks are part of a broader set of risks associated with the Internet of Things.
True advances require integration of context
As the computing power in appliances increases and gets less expensive at the same time, much of the voice assistant function that today is performed in the cloud can be built into the appliances themselves. This will improve both reliability and privacy. It will also give consumers more control, and support personalization and customization to specific contexts. To provide true convenience and ease of use, we have to use voice control in conjunction with other sensor inputs and external data sources. This is best accomplished by apps that have access to a broad set of data: the appliance state, all appliance functions, sensors in the environment, related devices, and external data sources. These apps also can develop and utilize personal profiles and histories. Keeping the data local improves privacy. However, true advances will occur not by just translating button pushes and knob turns into voice commands. We need to use artificial intelligence to raise the semantic level of the interactions by focusing on the high-level objectives of a task, rather than on individual operational actions. This shifts the focus from the appliances to the life-purpose for which they are being used. We need to start by rethinking how to accomplish objectives such as cooking a meal, maintaining clean clothes, saving energy, or keeping the home secure. Then we can determine what kinds of devices can best support those objectives, and how to use them in personalized contexts. We are seeing some manufacturers moving in this direction. A key challenge is to integrate voice assistants into these broader, more capable platforms. To overcome many of these limitations, voice interaction (not just control, but also feedback) must be deeply integrated from ground up in the appliance design. This change of focus will help us to realize the promise of the smart home. Note: An earlier version of this blog was published on Medium in January 2019. This update addresses some of the interesting developments that have happened since.