The chatbot is the poor relative of the intelligent assistant.

Definition: A chatbot is a domain-specific text-based conversational interface that supports users with a limited set of tasks.

Elsewhere, we discussed the 6 qualities of an intelligent assistant: voice input, natural-language processing, voice output, intelligent interpretation, agency, and integration of all the previous. A chatbot needs to possess only two of these attributes: natural-language processing and intelligent interpretation. In other words, a chatbot should be able to receive queries in natural language, and also understand and interpret them correctly (and then execute them).

In today’s world, chatbot growth and popularity is motivated by at least three different factors. First, there is the hope to reduce customer-service costs by replacing human agents with bots. Second, the success of conversational-based systems like WeChat has put forward the idea of chatbots as an interaction channel with businesses and services, intended to supplement existing channels such as the mobile web and mobile apps. Last, the popularity of voice-based intelligent assistants such as Alexa and Google Home has pushed many businesses to emulate them at a smaller scale.

These factors map onto the two different types of chatbots available today:

  • Customer-service bots take over some of the human load in chat interactions between a business and its customers. Examples of such bots include UPS’s Ask UPS and Alaska Air’s Ask Jenn.
  • Interaction bots are intended to provide an additional channel of interaction with a business for purposes other than customer service. For example, Domino’s Pizza bot allows users to order pizza; Flo from Progressive produces an auto-insurance quote. Such interaction bots are available on a variety of platforms, the most prominent being Facebook Messenger. (Other platforms include Twitter and Slack.) However, they can also live within an app (Bank of America’s Erica) or be available through text messaging (Eno from Capital One).
UPS chatbot page
UPS.com uses a chatbot for customer-service questions.

Kiehl’s Messenger chatbot: Users can interact with the chatbot either by typing text in the Send a Message textbox or by selecting one of the options displayed on the screen (Send Location or Zip Code; the Main Menu option below the text-input box).

While customer-service bots are often text only, interaction bots combine text with visual UI elements as a method of interaction.

Study

To understand the usability of chatbots, we recruited 8 US participants and asked them to perform a set of chat-related tasks on mobile (5 participants) and desktop (3 participants). Some of the tasks involved chatting for customer-service purposes with either humans or bots, and others targeted Facebook Messenger or SMS-based chatbots.

Attitudes Toward Bots

In general, people’s attitudes toward bots ranged from neutral to slightly positive. Interaction bots are a fairly obscure channel: most participants in our US studies had not interacted with such bots before and they were not aware of its existence. In contrast, WeChat official-account pages are often sophisticated versions of these bots, and our Chinese participants were familiar with them. Some of our participants were pleasantly surprised to discover this interaction channel, others felt that it didn’t bring enough value compared with other traditional channels such as the web or mobile apps.

Customer-service bots were perceived as generally less helpful than human representatives, but our participants also believed that they had some advantages. The most important advantage was speed: chatting with a human usually involves long wait times (either in a queue, before the conversation is started or as the human seeks a solution for the customer’s problem), whereas a bot can be instantaneous. On the other hand, people felt that access to a human shows that the company cares about its customers: “It’s really good when it’s a real person. They’re selling you an expensive item, so they have to spend more money on agents.” Another person said: “If you have a human [for customer service], it means that you care enough; if you have a bot, you should have a really good one.”

Interaction bots were usually easily identifiable as bots, but customer-service bots were harder to recognize. Some businesses do not always disclose upfront to their customers that they are interacting with a bot. We believe that this is a mistake. Our study participants were pleased when the business was transparent about using a bot because they could calibrate both their expectations and their language. For example, when users realized they were talking to a bot, they tended to be more direct, use keyword-based language, and avoid politeness markers. This type of language was generally more successful than the convoluted, indirect language often used in normal conversation.

UIs for Interaction Chatbots

Interaction Style

Both text and links/buttons could be used for inputting information into interaction chatbots. They had slightly different purposes:

  • Predetermined links and buttons saved users from typing. These were usually displayed in a carousel and could include images. People appreciated having these options and even expected them for common inputs.
  • Text allowed users some flexibility in choosing the types of questions they wanted to ask and enabled them to deviate from the (often too strict) script of the chatbot.

Both of these input mechanisms were important and they should be both present.

Users complained when a bot did not allow them to pick an option and instead required them to type. For example, a participant was annoyed with the Booking.com bot because it did not allow him to pick a date and he had to type it in. After typing in “thanksgiving” he said “It should be something obvious to prepopulate it. Options should be prepopulated — it’s tedious to do it this way. I could hit a button.”

People preferred to be able to select an option (left, Domino’s chatbot) instead of having to enter lengthy text (right, Booking.com). Consider creating buttons for the most common possible inputs.

Some bots removed the option to type text completely, forcing the user to pick one of the choices displayed on the screen. This type of design made the bot similar to a website and restricted the paths that the user could explore within the system.

Car Dealer Chatbots did not allow users to enter free text once they had selected the Inventory option.

Links and buttons worked best when they were clearly signaled. People expected to be able to click on almost any nontext element that was displayed by an interaction bot. For example, when the eero Messenger bot displayed a carousel of images intended to illustrate what eero did, most of our study participants tapped them, hoping to get more information.

Eero for Messenger (left): In response to Tell me more, the bot showed a carousel with several images and a set of possible continuations (text in rounded boxes). Users expected to be able to tap on the carousel items, but those were not links. Kielh’s for Messenger (right): The big images in the carousel were not tappable, but the links underneath them (Learn more now) were.

Besides regular buttons and links, some interaction chatbots also had a menu element, that, when selected, displayed a set of possible tasks. The menu was sometimes displayed below the input text box and sometimes it was shown as a small hamburger icon next to it.

Golden State Warriors had a Main Menu link under the input box. Selecting this link displayed a set of available tasks. When the user started typing in the input box, the menu icon was displayed next to it. The Main Menu link and the hamburger icon did the same thing.

Other links were sometimes displayed below the message box. These links were fairly discoverable: several of our users interacted with them; however, the icon next to the input box was less likely to be used.

CNN bot: Links below the message area indicated some of the main tasks in the chatbot (Top Stories, Topics, Settings).

Linear Flow

Interaction chatbots seemed to best resemble Alexa skills: they were designed to guide the user through a small number of tasks. Tasks supported by the bots are best conceptualized as linear flows with a limited number of branches that depend on the acceptable user answers. The bot asks a question and the answers serve to advance the bot on the right branch of the flow.

A decision tree for ordering food; depending on the type of food different branches are chosen; then there are nodes for subsequent option for that food (e.g., crust for pizza)
An example of a linear flow that a bot might advance through in order to complete a task

When the user is compliant with the flow and provides ‘legal’ answers that are in line with the system’s expectations, without jumping steps or using unknown words, the experience feels successful and smooth. For example, several participants were able to successfully interact with chatbots from Domino’s Pizza, Wingstop, Progressive. However, as soon as users deviated from the prescribed script, problems occurred.

For example, when asked by the Domino’s Pizza bot whether her location was an apartment or a house, a participant typed townhome and the bot replied I’m sorry. I seem to be having trouble understanding. Another user looking for Burberry belts typed “Belt” in the message box, but received information about order cancelation. When she refined it to “Women’s belt” she was told to select from a list of links, none of them matching what she was trying to find.

Burberry’s chatbot had trouble understanding what task the user was trying to complete and listed the tasks in the bot’s repertoire.

Bots also had trouble recovering from a problem or an unexpected input and sometimes forced users to start over at the top of the tree and do more work than necessary in order to obtain an answer. For example, one of our users wanted to know if Kia had any 4-wheel-drive electric models. She was forced to go through the whole decision tree for the Find a match task, answering questions such as the number of people that the car needed to accommodate and the MPG. When she answered “No” to body style preference instead of selecting one of the displayed options, the bot simply stopped and forced her to start over.

This video shows a user interacting with Kia’s chatbot to find out if Kia carries a 4-wheel-drive electric car. (In most browsers, hover over the video to display the controls if they're not already visible.)

Some bots were however more flexible and were able to understand requests that deviated from the script. For example, one participant who was aware of an ongoing promotion run by Domino’s Pizza was able to have it applied to his cart. He was also able to change the crust of one of the pizzas that he had ordered late in the flow.

The Domino’s Pizza chatbot was able to respond satisfactorily to requests that were not part of the main linear flow of ordering pizza. Left: The user was able to apply a deal to his cart before ordering. Right: The user changed the crust of one of the two pizzas that he was in the process of ordering.

Some bots had trouble making assumptions and establishing the context of a query. For example, one of our participants attempted to interact with Eno, Capital One’s text-message based bot. He happened to have two credit cards from Capital One, and each time he asked a question, the bot forced him to clarify which account the query referred to, without transferring context from one interaction to the next.

Bots were also generally not able to take advantage of previously entered information when a new task was started. For example, one of our participants first decided to order pizza for takeout; she entered her address, but she was told that Domino’s does not deliver there. She started again, this time aiming for store pickup instead of delivery. The bot asked for her address a second time, having apparently forgotten altogether that she had already entered her address.

Eno, the Capital One bot, forced the user to clarify which account he was asking about after each question. Left: Eno only recognized the keyword “transaction” (albeit misspelled) in the user query and was not able to answer the user’s specific question; instead, it gave an answer to a matching task in its repertoire. Right: Eno repeated the question asked previously.

Most of the time, when bots could deal with only a subset of the possible inputs, they enumerated them upfront and allowed users to select one. In the case of WebMD bot, however, people were unable to figure out what drugs the bot would be able to offer information on. For example, the bot had no knowledge of the drugs Zomig or Escitalopram, but was able to answer questions about Lexapro. Presumably, the bot only worked with a subset of drugs, but the list was too long to display. However, this design decision rendered the bot useless — there was no way to tell in advance what types of tasks the bot will help with.

With simple linear processes that tackle complex tasks, users fear omissions. They doubt that the best answer can be gotten through the bot. For example, one of our participants worried that Flo from Progressive was not able to offer all the discounts that he was eligible for; other participants wondered whether the bots from Booking.com or Travelocity would allow them to make an informed decision on a hotel room or a flight. Indeed, bots only have limited display space available, and it is unlikely that they would be able to show users all the matches for a query. People have few reasons to trust that the ones which are presented to them are indeed what they need. Booking.com did allow users to sort and filter the rooms according to their preferences, but, unless the user has very specific constraints, such a process is unlikely to narrow down the list of candidates to few enough to be easily handled through the bot carousel. Carousels, the UI element that bots use for showing sets of results, are simply not the best choice for displaying long lists.

Booking.com bot: There are too many hotels that match the user query; the bot displays some of the results in a carousel (left) and lets the user know that there are over 80 results in the last carousel item (middle). Although sorting and filtering are available (right), it’s unlikely that the number of hotels will be narrowed down enough so that the user could pick the right one using the carousel (which is how Facebook Messenger bots’ display lists of results).

Language

Although in our intelligent-assistant studies we found that people tended to still use polite language when interacting with Siri, Alexa, or Google Assistant, participants in this study generally dropped the politeness markers (“Please,” “Thank you,” indirect language such as “Would it be possible to…”) if they were aware that they were interacting with a bot. There are two possible reasons for this behavior: (1) chat, as a medium, tends to be slightly more direct than face-to-face or telephone conversation, possibly because of the higher interaction cost of typing (as one user put it, “On chat, I don’t introduce myself; I don’t say ‘hi’; I sort of feel bad[…] but I don’t think I need to say ‘bye’”); (2) the chatbots were less anthropomorphized than Siri, Alexa, or even Google Assistants, which had a human voice attached to them.

We noticed a difference between how people interacted with customer-service bots compared to interaction bots. The language used with customer-service bots was fairly complex — most of the time people were focused on their problem and attempted to describe it; they did not think about whether the bot would be able to understand it or not. When they did not receive a satisfactory answer, they often reformulated the same question, without necessarily simplifying it.

In contrast, because interaction bots were usually task focused and showed a set of possible tasks in the beginning, with them people tended to use simplified questions, with fewer multiclause sentences. Some participants quickly realized that “Bots don’t seem to understand the question, just keywords,” but did not necessarily adjust their language dramatically; overall, many of our participants typed full sentences, at least when starting an interaction with a bot.

Users were generally annoyed when the bot repeated the same answers over and over again. The UPS bot warned the user that it was going to repeat an answer and offered the opportunity to connect to a real person. Owning the failure and offering an escape hatch (phone number or a live agent) were generally perceived favorably. Recognizing that a question was not understood was disappointing, but better than a blatantly wrong answer (“I like that [the Domino’s Pizza bot] says ‘I don’t understand;’ at least it’s honest”).

Bots were not always able to deal with typos or slightly ambiguous formulations. For example, Domino’s bot did not understand “popperoni” and Eno, the Capital One bot, did not get “What was my statement for March 2018?”, misunderstanding “March 2018” for a possible account number. Eno was, however, able to provide an answer for “Can I have my statement from March 2018?”

Eno, the Capital One bot, did not understand the slightly ambiguous use of the preposition “for” in the user’s question.

Some bots had a personality; for example, Flo from Progressive was playful and light. However, one user was disappointed because he felt the bot was not able to sustain that tone consistently throughout the conversation. He said: “She kind of used some phrases like ‘now we’re cooking with gas’ and made it more engaged […]; but once it got into the actual information they needed for the quote it didn’t really have it anymore; it was just straight clean cut information. It can be a little misleading. In the beginning you’re like, oh, look, it kind of has somewhat of a personality[…], but after that stopped, I was more like, this is like a chatbot, it doesn’t mean anything. It took me out of that concept of ‘oh, I am speaking with flo from progressive’ versus I am just speaking with a chatbot.”

Flo from Progressive was not able to maintain the same playful tone of voice while it gathered information for an auto-insurance quote.

Privacy

Some of our users were worried about sharing personal data such as birthday, address, credit-card numbers, and social-security numbers with interaction chatbots. They were not sure whether this data would stay private or would be shared with the platform in which the chatbot was built. In particular, in the case of Facebook Messenger, people seemed keenly aware of recent data breaches: “I am a little worried about putting my information in Facebook Messenger […] I would be hesitant to enter credit-card information; it would be nice to be able to enter it elsewhere.”

What Is the Value Proposition for Chatbots?

With customer-service bots, the value is evident: if some of the users’ questions can be successfully addressed in an automated manner, than the business will benefit. Unfortunately, if the bot is too rudimentary, people will lose trust in the company and will feel ignored and unappreciated.

Things are less straightforward for interaction bots. These bots simply replicate functionality that is already available on the web or in mobile apps. Is it worth spending time and money for this new channel? Unlikely — at least in the US and other countries where the traditional channels are already well-established. As one user put it, “I don’t see myself doing [the task] in a chatbot — maybe if there was some rebate…” Companies are better off investing their money in the existing, well-established channels: improving the UX of your website or app will bring you higher return on investment than creating a chatbot that will get little use. We saw that even good chatbots (which are likely to require increased development and testing costs) have little chance of being discovered and considered useful.

First, most users are not aware of the existence of chat as a separate interaction channel in Messenger, Twitter, or Slack. When they think “chat”, they think “chat with a person.” Second, chatbots (as implemented today) are a fairly narrow communication channel — compared with a website, chatbots can display only a very small amount of information at a time (plus, typing is less efficient than selecting information). Third, chatbots today are still far from showing any “intelligence” — they feel like wizards: step-by-step linear processes for fairly simple tasks.

Do chatbots have any advantages? In their current embodiment, they just have one: less information overload. Some of our users were amazed that they could order pizza so easily with a chatbot. With a chatbot, there are fewer distractions and less information to attend to — as long as they stay within the chatbot’s rules, the experience will be straightforward. Problems appear when people deviate from these rules and have complex needs that cannot be addressed with a simple, linear flow. (An intelligent chatbot that could simply answer any user question would have a huge interaction-cost advantage over any web- or app-based interface. Unfortunately, we’re nowhere close.)

In theory, interaction chatbots can be useful for power users who may do the same type of tasks repeatedly — provided that they discover this channel. (The chatbot–website relationship is similar to the that between a graphical user interface and a command-line interface (CLI) in a traditional computer system: the most advanced users often prefer a CLI over the GUI for some tasks due to the perceived efficiency of typing a few abbreviated commands instead of rooting through menus.) But then again — a well-designed website will have shortcuts in place to help power users — for example, ecommerce sites should keep users logged in and allow them to reorder items easily. One of our study participants told us: “I would not use the Domino’s chatbot… On the site, I’m already logged in, so it knows my address and payment info; I don’t know if this is any faster.”

Maybe the only true benefit of Interaction chatbots is that they can serve as an experiment on the way to building a customer-service chatbot. There is no reason why some of the lessons we learn when designing interaction chatbots should not transfer to customer service.

UX Guidelines for Designing Chatbots

  • Be upfront about using a bot and not a human.
  • Clearly tell people what tasks the bot can do. Make sure you don’t create false expectations.
  • Don’t be overly ambitious: create bots for simple tasks. Complexity is not well handled in the limited bot interface.
  • Tolerate typos and ambiguity.
  • Allow people to interact with the bot both through free-text input and selection of links.
  • Allow sorting and filtering to let people narrow down through results.
  • Save information from one task to the next.
  • Program some flexibility into the bot: infer context and allow people to jump forward and backward in the linear flow.
  • Be honest about not understanding. Offer an escape hatch in the form of a real human, a phone number, or a link to a different interaction channel.