A/B Split Testing

A/B split testing allows websites to compare variations of the same web page to determine which will generate the best outcomes. The metrics used in A/B split tests are micro and macro conversions. A/B testing has become far more commonplace with the introduction of tools that require little or no involvement from developers and no other technology resources. This type of experimentation has a strong foothold amongst marketers, and, because of its relatively low cost it is becoming increasingly utilized by user-experience designers. Sites like Google, Amazon.com and many large e-commerce sites are known to “always be testing” — with multiple A/B tests running at any given time.

Garbage In, Garbage Out

A/B testing is an empowering tool when utilized appropriately. However, there are potential issues with A/B testing that run along 3 themes:

  1. The variation doesn’t do justice to the concept. Poor design can lead to poor conversions—that’s clear. However, each design is an implementation of a concept and it is foolish to judge the merit of a concept via a single design implementation. Often it takes several design attempts to adequately serve a concept. For example, you may theorize that adding a description to an option will increase adoption of that option. However, if the description is presented in a manner which makes it look like an advertisement, it might be ignored by users. The concept is not bad, but the implementation is. It’s vital that the concept and the implementation are clearly differentiated.
  2. The variation doesn’t address actual causes of issues. Making incorrect assumptions about what caused an issue translates to a design variation that doesn’t actually address the problem. Tweaks to this variation will never solve the problem because the design is a response to an invalid cause. For example, you may guess that the reason for poor loan application submission is that the process involves too many screens, so you condense it into one screen, but you still don’t see a lift. Instead, the real issue is that users cannot find loan interest rates and the only reason they end up on the application page is because they assume they will find it there.
  3. Variations are based on guesses. With A/B testing you only find the best option from among the available variations. And if the variations are only based on internal experience and opinion, who’s to say that the testing includes the most optimal design?

These experimentation flaws can be mitigated by informing A/B testing with user research. When even minimal user research is conducted, we gain invaluable clues as to the potential reasons for conversion issues.

Uncovering True Causes to Define Better Variations

“A theory can be proved by experiment; but no path leads from experiment to the birth of a theory.” 

- Albert Einstein

To ensure well executed A/B tests, the following must be defined for each experiment:

You could spend all your time generating cause theories and variation hypotheses and then A/B testing all of them: that is the brute-force approach. Haphazard A/B testing is the equivalent of throwing ideas at the wall to see which ones stick. Unfortunately, you cannot afford to do it: this approach increases the risk of user abandonment and poor experience. As you’re waiting to hit the A/B test lottery, users will interact with a suboptimal design. They may eventually decide that your site is a lost cause and never come back. You need to narrow down the number of hypotheses and deploy A/B tests cautiously and effectively; we recommend user research as the method that can help you with that.

Four User-Experience Investigations to Improve Optimization Testing

1. Defining user intent and objections

It’s vital to understand why people visit the environment, whether they are (or believe they are) successful, and why they decide to leave. If you incorrectly assume why people come to the site, your cause theory and variation hypotheses will not reflect the reality of the environment’s utility as perceived by its users. It’s dangerous to assume user objections without investigation. For example, suppose you assume that visitors don’t take desirable actions because your prices are too high, so you lower your prices and your profit margin takes a hit. If it turns out that the real reason people were not converting was not because of price, but because they did not understand the needs that your service addressed, you may not have a job come Monday.  

How to define intent and objections: Brief on-site, on-exit and/or follow-up surveys (via tools like Qualaroo) that ask two simple questions:

  • Why did you visit?
  • Were you successful? If not, why?

2. Exposing interface flaws

If you overlook significant usability problems such as confusing interaction flows or misunderstood cues, you may not see conversion lifts from A/B testing tweaks because your design variations do not address the kernel of the issue. For example: If you have a form where several of the fields ask for information that users do not feel comfortable providing, running an A/B test experiment to see if changing the submit button color increases the conversion rate is a wasted effort. Understanding true drivers of low conversion is critical to running smart, successful experiments.

How to expose interface flaws: Usability testing (remote moderated or unmoderated, or in-person) can be quickly conducted and may uncover 85% of the massive flaws in the site with only 5 users.

3. Measuring findability

Navigation-label and menu-design experiments allude to findability issues. However, poor findability can and should be confirmed before running A/B tests that directly impact information architecture and navigation. (You can learn more about information architecture in our 2-day class.)

How to measure findability: Tree tests measure the findability of elements within an existing or proposed information architecture without any influence from the interface design. They can tell you whether labels, link grouping, hierarchies or nomenclature are intuitive. If you are conflicted on naming of sections, pages, links and labels in your site, this test can identify the most problematic names and help define new labels that will improve findability. Tree tests can be performed with a tool such as Optimal Workshop’s Treejack, which allows you to generate tasks to test an information architecture.

4. Cleaning up design variations before going live

The simplest application of user testing is to clean up a design by removing stumbling blocks for users. A few hours of testing usually reveals anything really bad in your design. While more advanced forms of user research have their advantages, don’t overlook the humble clean-up study. For A/B testing, you want to make sure that all the design variations get a fair chance that is not marred by usability problems that keep customers from getting the full benefits of each variation. Clean them first.

Combine Methods to Maximize Conversions

A/B testing is a wonderful tool that can, unfortunately, be poorly utilized. If A/B tests are used in lieu of research, the variations are essentially guesses. You can improve outcomes of A/B tests by incorporating UX research to improve cause identification, develop more realistic hypotheses, and identify more opportunities for experimentation.

Learn more about A/B testing in our full-day course Analytics and User Experience