Overlapping information categories and confusing labels are two of the most pervasive problems in website design. Fortunately, there are fast and effective techniques you can use to create categories and labels that will make sense to your audience.

The most well-known technique is probably card sorting, in which users are given a list of representative content items to group and label as they see fit. Card sorting is invaluable for understanding how your audience thinks, but it does not necessarily produce the exact categorization scheme you should follow. For example, participants in a card sort often create a generic category to hold a few items which don’t seem to fit anywhere else; this is understandable, but if you were to actually include an “other stuff” category in your menu, the same users would avoid it like the plague. (Website visitors are notoriously reluctant to click on vague labels because they quite rightly suspect they’ll have to do a lot of work to sift through the content.)

For best results, a card sort should be followed up by a tree test to evaluate the proposed menu structure.

Definition: A tree test evaluates a hierarchical category structure, or tree, by having users find the locations in the tree where specific tasks can be completed.

Tree testing is incredibly useful as a follow-up to card sorting because it:

  • Evaluates a hierarchy according to how it performs in a real-world scenario, using tasks similar to a usability test; and
  • Can be conducted well in advance of designing page layouts or navigation menus, allowing inexpensive exploration and refinement of the menu categories and labels.

To conduct a tree test, you don’t need to sketch any wireframes or write any content. You only need to prepare two things: the tree, or hierarchical menu, and the tasks, or instructions which explain to study participants what they should attempt to find.

Defining the Tree

Your tree should be a complete list of all your main content categories, and all their subcategories. Even if you are interested in testing only a specific section of the tree, excluding the other sections is risky because it assumes that users will know which section to go to. For example, if your website had both a Products and a Services category, and you chose to test only the Products tree, you would miss out on finding whether your audience understands the difference between these two categories.

Depending on what part of the hierarchy you are most interested in, your tree may need to be 3, 4, or even 5 levels deep. Include the full depth down to the lowest level of subcategories you want to test. Each subcategory should provide the full list of all the options in that area in order to elicit realistic behavior from users. Users often evaluate link labels by comparing them with nearby alternatives. For example, users interested in history might be tempted to try a category labeled Culture — but not if there was also an option for History Resources.

Competitive Tree Testing: Labels vs. Locations

If you are considering different labels for the same tree category, you may want to test two different trees in order to compare how the terms perform.  Such a test is especially easy to do with Userzoom’s tree-testing tool, which allows you to randomly assign participants to different versions of the tree, in a manner similar to an A/B test on a live website. If you do test multiple trees, avoid showing the same user two alternative trees in the same session — users’ behavior when interacting with the second tree would be skewed by their experiences with the first one.

There’s no need to prepare and test a separate tree if you just want to compare different locations for a label — such as whether tomatoes should be placed under Fruits or Vegetables. Instead of testing two different trees for each location, you can test a single tree and compare how many users clicked Fruits vs. how many clicked Vegetables. (You’ll also be able to tell which category they tried first, if they clicked on both.)

Preparing to Test: Tools and Formatting

You could conduct a tree test using a paper prototype (or any clickable prototyping tool), but a service designed specifically for tree testing will vastly expedite the process of analyzing your results and is well worth it. Userzoom and Treejack are both good options for conducting tree testing.

Prepare your tree in a spreadsheet, where you can easily visualize and edit it, then simply copy and paste the entire hierarchy into your tree testing tool. The spreadsheet should be formatted with your homepage in the top cell of Column A, then lower levels listed out in columns from left to right. Make sure to list only one category on each row, so that your levels will be correctly parsed when you import the hierarchy.

Screenshot of spreadsheet containing a menu tree
This spreadsheet illustrates the tree, or menu hierarchy, for the New Mexico State government website. Each category appears on a separate row, and subcategories are placed in columns to the right of the parent category which contains them.

Once you have pasted your hierarchy into the testing tool, the categories are parsed and used to automatically create a clickable menu hierarchy in which each category can be expanded to show the corresponding subcategories.

Screenshot of a tree created in OptimalWorkshop's Treejack testing tool
A tree testing tool such as Treejack, pictured above, will automatically parse your spreadsheet hierarchy into a clickable menu with categories and subcategories.

Tree-Testing Tasks

The tasks you ask users to complete are just as important as the tree itself. First you need to decide which categories and labels to target. Ideally you should include tasks which target:

  • Key website goals and user tasks, such as finding your most important product (Success rates in your primary navigation tasks can serve as a baseline against which you can compare secondary tasks, and a reference point for future testing.)
  • Potential problem areas, such as new categories proposed by stakeholders or participants in a card sort

Label or location comparisons — any alternate labels or locations for the same category. For each task you write, you should also define the correct answer(s), corresponding to where the information is actually located within the tree. This information allows the testing tool to automatically calculate success rates for each task.

Example of marking the correct location for a task in Userzoom's tree testing tool
This screen from the Userzoom tree-testing system is used to indicate which category is the correct answer for a particular task.

Task Phrasing

Each task should test a category label by asking the user to find something contained within that category. As with usability-testing tasks, tree testing task instructions should avoid using terms that give away the answers. Preventing priming can sometimes be accomplished by describing a scenario and motivation, but also keep in mind that users may not read the instructions carefully, and could easily miss important details if they are buried in a lengthy story.

As an example, here are a few different possible phrasings for evaluating the Starting a Business category on the New Mexico State Government tree (depicted above):

  1. Find information about starting a business.
  2. You are moving to Santa Fe next year, and once you arrive you would like to supplement your income by opening a side business providing lawn-care services. Find out what regulations you will need to follow.
  3. You are considering opening a lawn-care service. See if there are any resources on this site that can help you begin the process.

The first example gives away the answer by using the exact label term, Starting a Business; while the second is long and packed with extraneous words that a user might easily mistake for the main point of the task if they were quickly scanning. The third option avoids both the label terminology and misleading details.

Limitations of Tree Testing

Tree testing is often executed as a remote, unmoderated study. After recruiting representative users, you simply send them a link to the study, and the testing tool walks them through the process of completing the tasks using their own computer. The testing tool is much better than a human would be at keeping track of exactly which categories users click on.

However, this format does not capture the full context of user behavior (such as comments made while performing a task) and you can’t ask personalized follow-up questions.

To minimize the effects of the format, conduct at least a few moderated pilot sessions before collecting the bulk of your data. In these moderated sessions you can ensure the task wording is understandable and also get a chance to pick up on nuances that might otherwise be hard to spot in the quantitative data. For example, in a recent tree test we noticed in the pilot testing that many users avoided a certain category for the first half of their session, because the label was so broad that they feared the contents would be overwhelming. This trend wasn’t noticeable in the quantitative results due to the task order randomization, but it was quite obvious as you sat through each session and saw task after task where users ignored an obvious choice. That insight alone made the pilot testing a day well spent.

You can also partially compensate for the inability to ask follow-up questions by including a short survey after the tree test. Rather than asking users to recall any labels they found confusing, provide them with a list of labels and ask them to check which were difficult to understand. This question can be followed up with an open-ended question inviting users to share any further comments and feedback, to elicit unexpected assumptions or misunderstandings that may not be apparent from the click history.

Conclusion

Tree testing focuses exclusively on evaluating category labels. This is both its great strength and a significant weakness. Since the menu that users interact with is completely devoid of visual styling and content, the experience is significantly different than interacting with the full design. For example, a design with mega menus provides a quite different browsing experience than the one tested in a tree test, since it simultaneously displays the contents of several subcategories.

However, even these inherent limitations can often be overcome or minimized with careful data analysis — for example, by focusing on whether the user selects the correct top-level category, rather than on success rates for sites with mega menus.

Overall, these limitations are a small price to pay for the benefit of quickly being able to iterate and evaluate major structural changes to an information hierarchy early in the design process. You can create a completely new tree to test just by editing your spreadsheet — with absolutely no design or coding required.