Heuristic evaluation is a good method for finding both major and minor problems in a user interface. As one might have expected, major problems are slightly easier to find than minor problems, with the probability for finding a given major usability problem at 42 percent on the average for single evaluators in six case studies (Nielsen 1992). The corresponding probability for finding a given minor problem was only 32 percent.

Even though major problems are easier to find, this does not mean that the evaluators concentrate exclusively on the major problems. In case studies of six user interfaces (Nielsen 1992), heuristic evaluation identified a total of 59 major usability problems and 152 minor usability problems. Thus, it is apparent that the lists of usability problems found by heuristic evaluation will tend to be dominated by minor problems, which is one reason severity ratings form a useful supplement to the method. Even though major usability problems are by definition the most important ones to find and to fix, minor usability problems are still relevant. Many such minor problems seem to be easier to find by heuristic evaluation than by other methods. One example of such a minor problem found by heuristic evaluation was the use of inconsistent typography in two parts of a user interface. The same information would sometimes be shown in a serif font (like this one) and sometimes in a sans serif font (like this one) , thus slowing users down a little bit as they have to expend additional effort on matching the two pieces of information. This type of minor usability problem could not be observed in a user test unless an extremely careful analysis were performed on the basis of a large number of videotaped or logged interactions, since the slowdown is very small and would not stop users from completing their tasks.

Usability problems can be located in a dialogue in four different ways: at a single location in the interface, at two or more locations that have to be compared to find the problem, as a problem with the overall structure of the interface, and finally as something that ought to be included in the interface but is currently missing. An analysis of 211 usability problems (Nielsen 1992) found that the difference between the four location categories was small and not statistically significant. In other words, evaluators were approximately equally good at finding all four kinds of usability problems. However, the interaction effect between location category and interface implementation was significant and had a very large effect. Problems in the category "something missing" were slightly easier to find than other problems in running systems, but much harder to find than other problems in paper prototypes. This finding corresponds to an earlier, qualitative, analysis of the usability problems that were harder to find in a paper implementation than in a running system (Nielsen 1990). Because of this difference, one should look harder for missing dialogue elements when evaluating paper mock-ups.

A likely explanation of this phenomenon is that evaluators using a running system may tend to get stuck when needing a missing interface element (and thus notice it), whereas evaluators of a paper "implementation" just turn to the next page and focus on the interface elements found there.

Alternating Heuristic Evaluation and User Testing

Even though heuristic evaluation finds many usability problems that are not found by user testing, it is also the case that it may miss some problems that can be found by user testing. Evaluators are probably especially likely to overlook usability problems if the system is highly domain-dependent and they have little domain expertise. In some case studies from internal telephone company systems, some problems were so domain-specific that they would have been virtually impossible to find without user testing.

Since heuristic evaluation and user testing each finds usability problems overlooked by the other method, it is recommended that both methods be used. Because there is no reason to spend resources on evaluating an interface with many known usability problems only to have many of them come up again, it is normally best to use iterative design between uses of the two evaluation methods. Typically, one would first perform a heuristic evaluation to clean up the interface and remove as many "obvious" usability problems as possible. After a redesign of the interface, it would be subjected to user testing both to check the outcome of the iterative design step and to find remaining usability problems that were not picked up by the heuristic evaluation.

There are two major reasons for alternating between heuristic evaluation and user testing as suggested here. First, a heuristic evaluation pass can eliminate a number of usability problems without the need to "waste users," who sometimes can be difficult to find and schedule in large numbers. Second, these two categories of usability assessment methods have been shown to find fairly distinct sets of usability problems; therefore, they supplement each other rather than lead to repetitive findings (Desurvire et al. 1992; Jeffries et al. 1991; Karat et al. 1992).

As another example, consider a video telephone system for interconnecting offices (Cool et al. 1992). Such a system has the potential for changing the way people work and interact, but these changes will become clear only after an extended usage period. Also, as with many computer-supported cooperative work applications, video telephones require a critical mass of users for the test to be realistic: If most of the people you want to call do not have a video connection, you will not rely on the system. Thus, on the one hand field testing is necessary to learn about changes in the users' long-term behavior, but on the other hand such studies will be very expensive. Therefore, one will want to supplement them with heuristic evaluation and laboratory-based user testing so that the larger field population does not have to suffer from glaring usability problems that could have been found much more cheaply. Iterative design of such a system will be a combination of a few, longer-lasting "outer iterations" with field testing and a larger number of more rapid "inner iterations" that are used to polish the interface before it is released to the field users.

References

  • Cool, C., Fish, R. S., Kraut, R. E., and Lowery, C. M. 1992. Iterative design of video communication systems. Proc. ACM CSCW'92 Conf. Computer-Supported Cooperative Work (Toronto, Canada, November 1-4): 25-32.
  • Desurvire, H. W., Kondziela, J. M., and Atwood, M. E. 1992. What is gained and lost when using evaluation methods other than empirical testing. In People and Computers VII , edited by Monk, A., Diaper, D., and Harrison, M. D., 89-102. Cambridge: Cambridge University Press. A shorter version of this paper is available in the Digest of Short Talks presented at CHI'92 (Monterey, CA, May 7): 125-126.
  • Jeffries, R., Miller, J. R., Wharton, C., and Uyeda, K. M. 1991. User interface evaluation in the real world: A comparison of four techniques. Proceedings ACM CHI'91 Conference (New Orleans, LA, April 28-May 2): 119-124.
  • Karat, C., Campbell, R. L., and Fiegel, T. 1992. Comparison of empirical testing and walkthrough methods in user interface evaluation. Proceedings ACM CHI'92 Conference (Monterey, CA, May 3-7): 397-404.
  • Nielsen, J. 1990. Paper versus computer implementations as mockup scenarios for heuristic evaluation. Proc. IFIP INTERACT'90 Third Intl. Conf. Human-Computer Interaction (Cambridge, U.K., August 27-31): 315-320.
  • Nielsen, J. 1992. Finding usability problems through heuristic evaluation. Proceedings ACM CHI'92 Conference (Monterey, CA, May 3-7): 373-380.