In our study of how images, GIFs, and videos assist learning, we observed that small details in those visuals can impact users’ perception of the content and even their success. Sometimes, seemingly minor details disrupted focus and sparked a short rant. Other times, it was only when asked what they thought about an image or video at the end of the task that participants mentioned the background music, the intonation of the narrator’s voice, or the style of the prop as reasons for finding the video boring or overly promotional.

In this article, we discuss such minutiae of multimedia as part of the broader message that graphics and videos should be helpful and meaningful instead of purely decorative.

No Mental Rotation

Consider how exactly people will use the image or video and what information they need from it. That should drive what the visual shows and from what perspective. For instructional content, you may need to literally show things from the user’s perspective: what does this process look like through the user’s eyes?

For example, one study participant struggled to complete a particular napkin fold, even after watching the video several times and looking through a step-by-step image-gallery version of the instructions. While attempting to fold the cloth napkin, she often rotated the napkin around the table before either revisiting a previous step or moving onto the next. When she eventually found a video she could follow successfully, she explained: “It helped a lot because [the video] was from my perspective, rather than me having to mirror someone. Her hands were right here rather than them coming at me.”

2 screenshots showing images with opposite camera angles.
Left: Viewing the napkin from an observer’s perspective meant the user needed to mirror the steps shown in order to accomplish them, which made the task more difficult. Right: The camera angle matched what the user would see when completing the steps, and so was easier to follow.

Mental rotation takes cognitive effort and can be avoided by choosing the optimal camera angle for demonstrating a process. If the user merely needs to understand what an object (such as a chimney starter) is, then showing it from an observer’s perspective would suffice. However, for step-by-step processes, filming the steps from the doer’s perspective allows the user to replicate those steps easily.

Get to the Point, NOW

People’s attention wavers quickly, even when they are initially interested in the video content. Many of our study participants regularly hovered over video controls while the video was playing to check how much time was left. This behavior often began about 30 seconds into the video — presumably when viewers were deciding whether to continue watching.

Long videos, particularly those with lengthy introductions, were a common complaint among the self-proclaimed video haters: “I don’t really like watching videos, I don’t know why that is. I feel like they go on forever.” Another stated: “I feel like videos take forever to be like, ‘First, open the door.’ … They don't need to be that drawn out.”

While text allows users to scan or read at their own pace, videos progress at a set speed and holding people’s attention for any length of time is difficult. (This is the main reason we recommend using video as a secondary source of information, and not as replacement for text.) One participant tasked with finding out how to replace the filter on an air purifier put it plainly: “about halfway through the video … I stopped caring that much.”

When a video answered participants’ question quickly and clearly, it was appreciated. Sure, people would often then stop playing the video before it ended, but because it was a good experience, they were more likely to watch other videos on the site or at least leave with a good overall impression regarding the helpfulness of the content.

Keep Videos Short

When the video acts as an overview of a process and not a tutorial, make it as short as possible — under 30 seconds is a good goal. For instance, one user in our Seniors study was looking at recipes for breakfast. While on the Tasty website, he watched a video accompanying a crepe recipe and commented “It’s a very nicely done video. And they’re short enough. The whole recipe, they show in less than 30 seconds and then you can find out more details if you want.” The 30-second video worked well because it was intended to draw people in and quickly summarize the recipe steps rather than offer a step-by-step guide.

When appropriate, several short videos will be better than a single long one because users will be able to play only those they are most interested in. For example, when asked to research the difference between mutual funds and exchange-traded funds (ETFs), participants on the Vanguard site were able to watch the provided videos in the order they thought would best answer their questions. Because the videos were seen as helpful, most participants eventually watched all 5 of them (or, at least, a portion of each video — some videos were stopped after 1:30, or even after just 20 seconds, once the user felt they had their answer). After finishing her task, one participant decided to play the last video, saying “I’m just going to finish it out, because it’s been pretty informative.” Afterward, she commented, “Every video is short, which I like. Even though total they’re over that 10-minute mark, individually I would listen to them all.” (She had previously stated that a “super long” video was anything 10 minutes or longer.)

Screenshot of Vanguard webpage with 5 video thumbnails.
The Vanguard site provided 5 short (3–4 minutes) videos answering various top questions about ETFs. These short videos were appealing to participants in our study because each represented a small time commitment — most participants ultimately played all.

On the other hand, for complicated processes like troubleshooting or napkin folding, where people will likely follow along, a slow pace (and thus a longer video) may be better. When the video progressed too quickly, users had to constantly pause it or rewind to catch up with the steps, and so they appreciated videos that were slow enough that they could keep pace. Of course, everybody is different, and it’ll be difficult (perhaps impossible) to find a speed that pleases everyone. For this reason, it’s ideal to also provide images and text for step-by-step processes so people can follow each step at their own pace. One participant literally said, “What I want is this video, but per step. That would be great.” (Note: GIFs were effective for quick animated instructions broken out into individual steps — participants liked that, unlike for videos, they didn’t have to commit to playing the GIFs.)

Details Can Distract

Paradoxically, even if their attention wavered during videos, many people were critical of small — even insignificant — details, which sometimes distracted from the main purpose of a video.

For example, when watching a video describing how to set up a charcoal grill for grilling, a participant paused and rewound it to point out something that was in the actor’s hair in a single overhead shot: “Oh, whoa. Let me go back a little bit … Check it out, there’s something in her hair right there. … I wonder what it is.” She later added, “It’s a little distracting with that dot on her head.” (The same issue would also certainly occur for an image as well, though a single photo is often easier to edit than every frame within a video, and so these distractors are more easily avoided.)

Screenshot of video paused, showing speck of white dust in the actor's hair, visible in the lower left of the video frame.
One participant on the Home Depot site got distracted by a speck of dust that appeared in the actor’s hair (even though it was out of focus!) during an overhead shot showing how to set up charcoal in a grill.

While these types of small accidents can be hard to catch, other details, like the choice of props, are easily controlled. For this video, several participants commented that they weren’t sure if the demonstrated steps applied only to the somewhat fancy grill used in the video and that they wished a common, basic style of grill had been shown instead.

The tone of background music for a video is another detail that can affect users. For example, one participant was looking through the help and support documentation for a meeting scheduling tool, Calendy, and watched the video provided for learning about managing availability. Afterward, she commented that the tool seemed easy to use, but the video appeared to be “less of a how-to, and more of a sales-type of video.” Part of that perception was because: “that little, do-do-do-dooo thing [background music] is a little, I don't want to say childish, but it feels cheeky or something. And the person talking — I know they're trying to be helpful, but it comes off as, 'Hey idiot, you can do this by …' almost like, infomercially.”

Screenshot of video paused, showing speck of white dust in the actor's hair, visible in the lower left of the video frame.
The Calendy Help Center included videos with “cheeky” background music and an overly peppy narration that made one participant comment that the videos were focused on selling the software rather than on teaching how to use it.

That said, small details can also be highly appreciated. The same user who complained about the speck of dust visible in the actor’s hair also commented that she appreciated the actor was wearing Converse sneakers, because they “make [the actor] hipper, more approachable, not a dad-bod thing.” Similarly, a study participant said that one main reason he enjoyed watching Tasty recipe videos on Facebook was because “the guy’s voice [at the end] was very appealing. I would sometimes watch it just to hear him say, ‘Oh yeah’ because you knew he was going to enjoy whatever he was eating.”

Consistency — maintaining the same (or at least, a very similar) camera angle throughout a video or series of images and using the same props, actors, and backgrounds — was another detail that people appreciated (not surprisingly, since consistency helps them focus on the important points in the video instead of being distracted by the novelty of the props). For example, a participant on the Cooking Channel’s site stated that she liked the images on a page about charcoal grills because they had “continuity, in terms of like, all the pictures are of the same grill, with the same person, with the same chimney.”

2 screenshots of a picture gallery, showing images that were consistently photographed.
Cooking Channel: Images in a picture gallery were appreciated for being consistent, by using the same camera angle, grill, and other major props throughout.

Conclusion

Even when a video is intended to be informational or instructional, it must still also be entertaining and manage viewers’ attention wisely. Getting to the main point quickly and keeping the overall length short — or breaking a topic up into several short points — are critical for video content.

Details such as continuity and consistency, the intonation of the narrator’s voice, props, background music, and basically anything that is visible in the video or graphic must be scrutinized and designed thoughtfully, or they can easily distract users and change their perception of the content.

Videos, animated GIFs, images, and other multimedia components should not be added as “fluff” or to “jazz up” a website, app, or other medium. They are user-experience elements, and must be filmed, designed, and produced with an eye toward usability and interaction.

Keep the users’ goal in mind. For graphics that people will be referencing while they complete a step-by-step process, ensure that the camera angle shows what users will see, from their perspective, as they follow along. Images or GIFs with text instructions per step are easier to consume than videos, because they allow people to go at their own pace and fully understand one step before being rushed to the next.