This article is a follow up to Tree Testing Part 1: Fast, Iterative Evaluation of Menu Labels and Categories.
Tree testing evaluates the categories and labels in an information architecture. We recently explained the process for designing a tree test; once you’ve planned your study, the next step is to collect data and interpret the results. Unlike think-aloud usability testing, most tree tests are run as unmoderated studies and generate only quantitative results. This method allows you to quickly collect data from a large number of users, but requires a different approach to extracting insights. You can’t just sit through a day of testing and jot down notes, but instead you need to take a systematic analysis to identify data trends and evaluate their significance.
Collecting Data
Study participants. Just like with usability testing, a good tree-test study must recruit representative users as study participants, particularly for products with specialized target audiences. Don’t recruit college students to test a website about life insurance.
Since tree testing allows you to easily collect data from a large group of users, aim for at least 50 users, to allow trends in user behavior to emerge and minimize the impact of any unmotivated participants who provide poor-quality data. If you plan to test two trees and compare their performance, you’ll need twice as many participants, because the comparison requires a between-subjects study design (i.e., different people test each version).
Tasks per participant. Ensure that each participant performs only 10 tasks (or fewer). Even though tree-testing tasks can be completed quickly, it’s still not a good idea to have people do 30 tasks in a row. Once someone has clicked through the same menu 15 times, they are in quite a different state of mind than an average user who has just landed at a website and may have never seen the menu before at all. If you need to test more than 10 tasks, recruit more users, and use the tree-testing tool’s randomization feature to assign only 10 tasks to each participant.
Pilot test. Finally, invite a small number of users to complete the study and review their responses before sending it to your entire group. The pilot test can expose any unintended problems with your task wording early enough to correct them.
Tree-Testing Metrics
Once the results are in, a variety of metrics capture how users understood (or misunderstood) your categories. Treejack and UserZoom, the two most common tree-testing tools, each use a slightly different style for presenting these metrics, but both provide these quantitative measures for each task in your study:
Success rate: The percentage of users who found the right category for that task
Directness: The percentage of users who went to the right category immediately, without backtracking or trying any other categories
Time spent: The average amount of time elapsed from the beginning to the end of the task
Path measures:
- Selection frequencies for each category
- First click: the category most people selected first
- Destination: the category most people designated as their final answer
Depending on the type of tree and tasks in a study, some of these metrics may be more useful than others at predicting how well the information architecture will perform in real life.
Success Rate
In order to calculate the success rate, you must assign at least one ‘correct’ answer for each task. The success rate for that task indicates the percentage of users who found the correct location in the tree and identified it as the right place to complete that task. Any trials in which users selected a different final location are reported as failures. For example, if, when asked to find information about the New Mexico state library, 67 out of 100 participants selected the correct location, the success rate for that task is 67%.
On the surface, the success rate seems simple: higher is better. But to take action based on this metric, you first need an appropriate frame of reference to determine what a ‘good’ success rate is for both the overall tree and for a specific task.
Remember that, by its very nature, tree testing eliminates many helpful design elements, such as the search function, secondary navigation options (like related links), and any context cues from the visual design or content. Users see only the stripped-down navigation menu itself.
Because tree tests are so basic, success rates are often much lower than in regular quantitative usability studies. A 67% success rate on a tree test could easily become a 90% success rate on the final design. (However, this increase would happen only if the rest of the design was well executed; a bad search implementation or poorly designed menus can also reduce success rates below levels observed in a tree test.)
Instead of expecting to achieve a 100% success rate, use a more realistic frame of reference to evaluate what success rate is acceptable for each task, taking into account:
- The importance of that task to the overall user experience
- How each success rate compares to other similar tasks (e.g., tasks which target content at the same level in the hierarchy)
For example, consider two tasks and their respective success rates in the table below. The success rate for the food-stamps task is much lower than for the other task, but this result is partially because users must drill down three more levels to find the right answer.
Task | Correct Answer(s) | Success Rate |
Where can you find directions and hours for the New Mexico State Library? |
Citizen >Education >Libraries >Library, New Mexico State |
67% |
Find the rules that determine who qualifies for food stamps in New Mexico. |
Citizen >Health and Wellness >General Health and Wellness >Human Services Department >Looking for Assistance >Food Assistance >Supplemental Nutrition Assistance Program |
43% |
Rather than comparing these two success rates, it would be more realistic to compare either:
- The success rate for the food-stamps task to that of another task which also targets content that is 6 levels down; or
- The success rate of the food stamps task performed on two different trees with different labels — one which uses the term Food Assistance and one with the term Food Stamps.
Directness and Time Spent
In addition to measuring how many users got to the right place, it’s important to also consider how much they struggled on the way. Two common tree-testing metrics signal this: time spent, which indicates how long it took users to find the right answer, and directness, which captures how many users went immediately to the right answer, without backtracking or changing categories. Direct navigation is also sometimes called the ‘happy path’ because it suggests smooth interaction, with minimal confusion or detours.
Tasks with high success rates can still be a poor user experience if users must retrace their steps and try multiple places before finally finding the right answer. For example, consider this task about finding the cost of tuition. Even though 74of users eventually found the right answer, only 50% of them took a direct path. . Half of the successful users had to retrace their steps at least once before locating it — although the information was actually available in 3 different locations in the tree.
Both time spent and directness give an indication of how easy a task was for users. Directness is especially important for tasks frequently done by novices or occasional users, because they won’t have the benefit of learning and remembering locations from past experience.
Pathways: First Clicks to Final Destinations
Success rate and directness tell you whether a category is findable; detailed pathway analysis helps you figure out how improve categories that don’t work well.
The first click for a task is the category users select first when they begin that task. In tree testing, the first click is always a top-level category, because none of the subcategories are visible until a parent category is selected.
The first click is critical because it often predicts whether a user will eventually be successful in finding the right item. Imagine you are looking for the food court in a shopping mall. If the food court is on the top level and you start by taking the escalator down, your chances of finding it any time soon are slim. But if you start by going to the right level, chances are you’ll be able to wander around a bit and find it, if only by the smell of food.
The first click operates in the same way. Once users get in the general vicinity of the correct category, context cues and local navigation make it more likely that they find it. But incorrect first clicks are often disastrous; the table below show the first-click data for a task which had only a 20% success rate. The correct top-level category, Directory, received only 14% of the first clicks. Instead users started in the Program or School sections, and most ended up wandering around those areas and never making it back to the Directory.
Examine the first click data carefully when:
- A task has low success rate and/or directness. The first clicks indicate where users initially expected to find that information, and suggest locations where the item should be moved (or at least crosslisted).
- The final design will use mega menus that expose both the 2nd and 3rd level categories at a glance. The ability to see and compare multiple sublevels simultaneously can drastically improve success rates above what you would observe in a tree test – but this only works if the first click is successful, and users make it to the right mega menu.
If you have many tasks where first clicks are distributed across multiple categories, you may have too many overlapping categories. Do a card sort, or review the tree-test results again and look for other possible organization schemes.
Review the final destinations selected by users when the first clicks are correct, but the success rates are low. This pattern suggests that lower-level categories overlap too much.
Turning Data into Action
Although tree testing yields quantitative data, the conclusions are by no means black and white. Task success rates are just the first step, and must be interpreted within the context of how much users struggled to get to the right answer (directness), and where they expected the right answer to be (first clicks).
Once this analysis is complete you can identify appropriate solutions. For example:
- When first clicks are evenly distributed in multiple areas, list topics in multiple categories. If this issue occurs for many tasks, consider changing the overall organization scheme.
- When success rate is low but first clicks are correct, change the labels of subcategories to be more distinct.
Learn more about selecting organization schemes and labels in our full-day course on Information Architecture.
Share this article: