Do users like systems with better usability? You might think that's a stupid question, but research shows that the full story is a bit tricky.
To give away the answer: yes, users prefer the design with the highest usability metrics 70% of the time. But not 100%.
Measuring Preference
To operationalize the question, we must get into details. My assessment of user preference comes from a simple satisfaction question: On a 1–7 scale, how satisfied were you with using this website (or application, intranet, etc.)? Averaging the scores across users gives us an average satisfaction measure.
It's extremely important to note that we always give the test users our (very short) satisfaction questionnaire after they've tried using the design. It's completely invalid to simply show people some screens and ask them how well they like them. If people haven't actually used a user interface to perform realistic tasks, they can't predict how satisfied they'd be when actually using the system. (And real use is what matters, after all, not what people say in a survey.)
Because we measure preferences by simply asking users, the metric is inherently subjective. But it's a metric nonetheless. The question here surrounds the possible relation between this subjective metric and more objective measures of system usability.
Measuring Performance
Referring back to the definition of usability, we find several measurable quality attributes that combine to form the bigger construct we call "usability." One is subjective satisfaction, as just discussed. Other — more objective — criteria include time on task, success rate, and user errors.
To calculate objective performance metrics, we basically ask users to perform representative tasks and record how long it takes them (and whether they can do the task at all).
Quantitative measures are harder to collect than simpler usability insights, so we don't include them in all our studies. Of the 1,733 sites and apps we have systematically tested in Nielsen Norman Group, we have good quantitative and subjective metrics for 298 designs.
Comparing Objective and Subjective Metrics
The following chart shows the combined objective and subjective usability metrics for the 298 designs where we measured both. Each dot is one website, application, or intranet.
We recoded the raw numbers to a uniform system that lets us compare very different classes of systems. After all, whether it's good or bad to have a task take 5 minutes depends on how quickly users can perform that task with alternate designs. I thus calculated how many standard deviations each system scored relative to the mean of its peers. Also, I made sure that bigger scores in the chart always represented better usability. So, for example, for user errors, smaller numbers are better, so being one standard deviation below the mean error rate would be shown as a score of +1.
The y-axis shows how favorably users rated each design on the subjective satisfaction survey. To make this metric comparable with the x-axis, I also converted those raw scores into standard-deviation scores.
Thus, dots to the right of the vertical axis represent designs on which users performed better than average; dots to the left represent designs on which users performed worse than average.
Similarly, dots above the horizontal axis represent designs that users liked better than average, while dots below it represent designs that users rated worse than average in terms of satisfaction.
Correlating Performance and Preference
The red line is the best-fit regression between the two types of usability metrics. It's clear that there's a strong relation between the two, with a correlation of r = .53.
In other words, if people have an easier time using a design, they tend to rate it better in satisfaction surveys. But the correlation is not a clean 1.0, so there's more at play.
The paradox of subjective satisfaction is that objective and subjective metrics sometimes conflict. It doesn't happen often. Here, for example, 70% of the dots are in the expected quadrants:
- Upper right: designs on which users performed better than average and that they liked more than average.
- Lower left: designs on which users performed worse than average and that they liked less than average.
The paradoxes are the 30% of dots in the unexpected quadrants:
- Upper left: designs on which users performed worse than average, but that they liked more than average.
- Lower right: designs on which users performed better than average, but that users liked less than average.
However, there are no strong paradoxes — that is, cases in which users performed much better and strongly disliked the design, or cases in which users performed much worse and strongly preferred the design. (Such strong paradoxes would have appeared as dots in the chart's extreme upper left or lower right corners, respectively.)
Here, we find only weak paradoxes: cases in which users performed a little better and slightly disliked the design, or cases in which users performed a little worse and slightly preferred the design anyway.
(If anyone counts the dots in the chart, they'll notice a small twist: The chart includes 298 dots, representing the 298 Nielsen Norman Group studies that measured both subjective and objective usability metrics. But the 30% paradox estimate comes from an analysis of 315 cases — that is, it includes a few more cases where I could determine the agreement or disagreement between performance and preference, but didn't have enough data to plot those last 17 dots.)
Consider Both Satisfaction and Performance Metrics
There are two practical takeaways from this data analysis:
- Performance and satisfaction scores are strongly correlated, so if you make a design that's easier to use, people will tend to like it more.
- Performance and satisfaction are different usability metrics, so you should consider both in the design process and measure both if you conduct quantitative usability studies.
Share this article: