What designers need to know about designing for voice apps
Designing apps that use speech recognition software are fundamentally different from designing screens. Without visual guides, designers must focus on the UX writing and voice interaction of the app.
Creating a voice user interface means the personality of the app is unseen and must be written. Whereas, designing for screens typically means adding the feel and personality into the look, using color and layout to guide users. Unlike the content of a website, a voice-controlled app must be able to converse with the user. And , a voice app should be able to function without the user modifying the way they speak to use it.
How does functionality work?
Speech recognition software interprets voice inputs as wavelengths, which then are analyzed and turned into a digital prompt recognizable to the software. While it’s still an imperfect science, speech recognition has not only made using technology quicker and easier than before, but it’s allowed access to more users by foregoing the need to visually see a screen or physically click around a webpage.
With AI becoming more and more sophisticated in accounting for variances in speech patterns, speech recognition software opens the door for innovative applications. However, in order to fully capitalize on the utility of this software, good design is crucial.
Personas for voice apps
Not only will the app need to be designed with a persona for the users in mind, but the app itself needs a persona to interact with users. Interaction that feels more personalized will rely not only on the users’ inputs, but also on the way in which the app responds.
An app that feels more personalized will be more successful than an app that gives robotic and contrived conversations.
Writing a personality creates a more meaningful and charming interaction. As an example, when the app needs a specific size or quantity modifier to complete a task, there is a difference between the app saying “Please specify size and quantity of your order” versus “What size and how many pizzas do you want?”
Voice app usability
Usability must come from the way the app registers user commands. Just as forcing users to work around buttons and screen layout is not user friendly, forcing users to talk in an unnatural way in order to make an app work is also not user friendly.
What is the flow?
Flow determines how easy the app will be to use. Using the user persona as a guide for how a user will interact with the app, a designer can think through potential prompts and how a conversation with the app to the user’s end goal might sound.
Essentially, a user flow becomes a script. With a script, an app must not only anticipate the user’s prompts, but accept prompts that would come from a normal human to human conversation.
Accessibility and learning ability for VUI
What does accessibility look like?
Accessibility and inclusivity with speech recognition software means designing the app to respond to inputs from a wide range of accents, pronunciations, and slang. A single input prompt will need a varied amount of redirect inputs to account for alternate ways to say the prompt.
In the same vein, creating a user friendly experience for all users makes for a more successful app. Users should not have to limit themselves, confine their speech patterns, or feel hindered by an accent when using speech recognition software.
Managing learnability
Because a voice interface often doesn’t have a built-in visual component, this makes learning how to use the system a challenge for users. Voice apps need discoverability built in to help users learn what utterances they can say in order to interact with the system. Sometimes they are companion apps and sites and sometimes they are built into the system itself.
For example, Amazon’s Alexa sends its users a weekly email called “What’s new with Alexa,” that includes a list of phrases users can try with the device.
In addition, sometimes the device itself will prompt users to try a new command. For example, after asking Alexa to complete a command such as dismissing a notification, she might say, “By the way, I can set a reminder to add this item to your shopping list. Just say ‘Add item to shopping list.’ Would you like to try that now?
These prompts help new commands and interactions become discoverable for users.
User frustration
Sometimes users try a phrase but it doesn’t work. This might be for a number of reasons, for example, if the command isn’t built into the system or if the speech recognition doesn’t understand an accent or pronunciation.
It’s important to test and train the system thoroughly to avoid these types of issues, but they likely will never go away. In that case, it’s important to think through how to manage user frustration.
For example, in the case of our Creative Director, her daughter has many frustrating moments with Alexa, who doesn’t always understand her commands to play a particular song. The app will try and will get the song wrong multiple times as the 11-year-old girl screams “Alexa, stop!” It’s not enough of a pain point to cause the device to be tossed out the window, but the frustration is real. Amazon’s Alexa will say “Sorry, I don’t know that one” if she really doesn’t know what to try. This softens the frustration.
Voice recognition is far from perfect. In some cases the AI doesn’t understand the utterances from the user. In the video example below, the context may have been the issue, since the user was trying to use Maps without actually moving. Equally, it could have been the instructions the user was trying to give the app and the wording or even the pronunciation.
Creating a seamless experience
With the app’s persona as a blueprint, adding in conversational flair to responses allows users to ease into conversing comfortably and naturally.
For example, voice recognition can help a user who is driving multitask without taking their eyes off the road. Here’s an example — Google has its voice prompt easy to access at the bottom of the screen when a driver is using maps. Just by tapping the prompt or using the “OK Google” trigger, a driver can ask the app to open Spotify and play a playlist, access Audible to listen to a book, or call a contact.
In some video games, voice recognition is an integral part of the game, relying on players to say certain lines into their mic to trigger events or even have in-game consequences for picking up sounds from the user.
Another way to make the apps more delightful and useful is to find ways the app can create an individualized experience. For example, if a user ordered a pepperoni pizza through the voice app and wants to order another pizza the following week. The app might remember “You got a pepperoni pizza the last time.” and follow up with “Do you want the same thing, or do you want to try something different?”
Opportunities for delight
Voice recognition software creates new opportunities to design, in which voice inputs and writing take the place of traditional visual screen design. It opens the door for a more inclusive experience, in which users who have trouble with, or cannot use visual cues to navigate, have the ability to use applications that they were once excluded from. Adding voice recognition offerings can enhance a product and delight users. With an effective design, voice recognition software can be applied to create a more usable and user-friendly
So what have we learned about designing for voice?
Designing for voice is inherently different from designing for screens. Rather than relying on visual cues, designers must think about how their users will naturally speak with an application and provide personality to that application. It’s about thinking forward about what a user will try to say as they learn an app and helping manage frustration when they try a command and it doesn’t work.
Speech recognition applications have special design considerations, such as app personality, learnability, and managing user frustrations. Be sure to think through these needs before jumping into creating an app that could cause users to turn away from your service if it’s not designed well.