While the world currently is locked down due to COVID-19, many businesses which rely on  physical spaces have turned to virtual tours to provide a sense of the space for users who are currently unable to visit. Especially in real estate, there has been a lot of recent emphasis on virtual tours of homes. Many other types of business, such as cultural institutions, universities, wedding venues, and even outdoor attractions, have followed suit.

This technology has been slowly maturing in the background for years, and many users have been exposed to the basic interaction paradigm through popular examples such as the Street View feature within Google Maps.

We conducted a qualitative study with 16 users to find out the good, the bad, and the dizzying aspects of modern virtual tours.

We tested a variety of websites from industries that use virtual tours: real estate, wedding venues, outdoor adventures (national parks, scuba diving, etc.), cultural institutions (such as museums and art galleries), theme parks, and university campuses (soon to be covered in a separate article). Among all these websites, several types of virtual tour were commonplace:

  • Free-movement 3D walking tours
  • A series of 360° photos linked together (much like Google’s Street View)
  • 360° videos
  • 2D video tours offering a guided (but noninteractive) tour of a space

VIDEO: Artland offers a 3D gallery tour that enables users to move relatively freely through the virtual space. This type of tour had the most coverage, but still had fixed points where 360° images were taken.

The Vatican's virtual tour offered only a single 360-degree photo per room
The Vatican museum’s virtual tours offer a single 360° photograph per room, but links between rooms with an arrow icon on the floor. ​​
National Marine Sanctuaries offered 360-degree (non-interactive) videos
The National Marine Sanctuaries offers video tours shot in 360°, so the user can pan around as the video plays but cannot move independently.

Virtual Tours Typically Used for Checking Details Later in the Process

In our study, users often noted the presence of virtual tours on sites that offered them, and used phrases like, “if I were interested, they offer a virtual tour, which is nice.” While the positive sentiment toward the option to take a virtual tour was fairly consistent, many users then went about their business without interacting with the tour, demonstrating that they were not actually very interested (and further proving one of the first rules of UX — “don’t listen to your customers, observe their behavior instead”). Indeed, in many of the test sessions, users did not interact with the virtual tours until later in their visit on the site or when directly instructed to open the tour by the study facilitator. (A skilled facilitator will prompt participants to use the feature of interest only at the end of the task, to avoid bias.)

In fact, especially when the task involved gathering information for consequential decisions, such as buying a home, booking a wedding venue, or choosing a university, participants chose to interact with standard photo galleries, text descriptions, and even prerecorded video tours before using the 3D virtual tours. Their comments showed that they expected the virtual tours to take effort to use and they preferred to start by looking at photos to decide if the property, artwork, or physical space was interesting enough before investing in a virtual tour. Thus, like is often the case with such intricate features, users were doing an (unconscious) cost-benefit analysis of the virtual tour, weighing in the expected interaction cost against the extra information delivered by the tour. A study participant summed this up well and said:

“I don't see a point of just the 3D tour without photos — [photos] are primary and [a 3D tour] is secondary”

This preference for still photos as the initial touchpoint was due to two factors — (1) photo galleries can be quickly swiped through to see a wide variety of views within the space in a short time, and (2) photos afford a degree of direct access to certain details (as one doesn’t need to walk through the whole house to look at a particular bedroom). As one study participant put it,

"The photos are like an advertisement to entice me to go in. If I take the tour, I'm investing a lot more time, so it's good for looking at all the nooks and crannies. The 3D tour is when I start to fall in love and want to see the details.”

However, all participants noted that, once they reached the point in their decision making where they wanted to get into details, a virtual tour was superior to photos for truly getting a sense of the place, warts and all. Several users noted that (especially in real estate or for renting event spaces), still photos with professional equipment can be rather misleading — making a space look larger, more glamorous, or having better light than in reality. One user explained:

“I do like that it allows me to move around freely in the home, so I can see the spatial relationships. This gives me a much better idea what it’s actually like in, say, a bedroom, because seeing it in this 3D tour, the room looks much smaller than it did in the [still] photos. If you get the right camera lens and angle [with a still camera], you can make a room seem much larger than it really is.”

Real-estate and event-planning users often sought out virtual tours to check on details such as:

  • The overall flow of the space, from room to room
  • How big each room felt
  • The condition of windows, flooring, and fine details such as crown moldings
  • The type and condition of appliances
  • The number of power outlets in a room
  • Quality of light and views from windows
  • How many people could comfortably fit in a space
  • What type of furniture could fit and how it could be arranged

Virtual Tour Users Craved Expert Guidance

Many users in the study also wished for a traditional, narrated 2D video as a secondary step (in between viewing a photo gallery and taking a 3D tour) to give them a guided, expert walkthrough, and get them excited about the space. Several users noted that watching a video involves low effort, but provides similar benefits for understanding the flow of the space and getting to see some detail. They noted that an expert guide (such as a realtor, wedding planner, museum docent, or park ranger) could offer up useful information at the right moment — often things that users wouldn’t even know to ask.

This desire for credible, expert guidance was pervasive in all types of virtual tours — people wanted someone who had expertise to share the key details about the home, venue, artwork, or national park rather than exploring by themselves.

Surface-Level Delight Fades Quickly, and Users Move On

While many of the visually impressive virtual tours elicited a substantial wow factor in study participants, the initial delight quickly subsided. Many users exclaimed, “Oh, this is so cool” mere seconds before closing the tour and moving on to something else, such as a photo or prerecorded video.

The quick dissipation of superficial delight was most prominent on leisure tours — national parks, art galleries and museums, zoos, tourist attractions, and cultural institutions such as the Vatican or Buckingham Palace. Initial excitement was followed quickly by a shrug and dwindling engagement.

The leisure tours that were engaging for the longest were those that limited the interaction cost of users and presented a somewhat guided or curated experience. These tended to be 360° videos or photos that offered substantial narrative (either literal audio voiceovers or written text presented at key contextual moments).

The Harry Potter amusement park's virtual tour was not very engaging
The Wizarding World of Harry Potter at Universal Studios’ virtual tour elicited short-lived delight in one user, who quickly grew bored and moved on.

Moving Within the Space Is Slow and Effortful

One of the biggest problems users encountered was slow and difficult movement through the virtual space; turning around in particular was incredibly effortful. For example, a mobile-device user “walked” into the bathroom in a virtual house tour and grew frustrated trying to turn around to go into another room. He noted, “for me to turn around and get out of [this room], that was 8 thumbs [referring to the number of swipe gestures he needed to turn 180°]  […]  Which is fine, but to explore a whole house in this kind of model is a little frustrating.”

A virtual tour image of a bathroom proved difficult to turn around
A user noted his frustration with how much effort it was to simply turn around and leave a room; he said “This took 8 thumbs [swipes] just to turn around.”

In many ways, this experience resembles the crude 3D video games of the 1990s — in fact, several participants compared virtual tours to the 1993 video game MYST. While video games have progressed dramatically since then, virtual tours are still stuck in a very similar interaction paradigm. Loading times are slow, the number of spots with 360° “coverage” are often limited, and moving speeds (both turning and moving forward or back) are limited to ensure that users aren’t given vertigo.

In a physical space, users can (unconsciously) choose how quickly they turn their head and their walking speed. This is thanks to the unsung “sixth sense” called proprioception, or the ability to be aware of one’s own body’s position and movement in space. Modern video games offer a limited, but still powerful ability to control movement speed through the common dual-joystick control system (left joystick for moving through space, and the right joystick controlling the camera angle). How far the joystick moves from the center controls how quick the movement is. Even though this process is a more conscious than quickly turning one’s head, it’s still a relatively intuitive design. Even mobile games such as Fortnite have usable controls for moving a character through a 3D space using a virtual version of the dual-joystick setup. Yet, virtual tours seem to be stuck in an outmoded control paradigm.

Changing panning sensitivity isn’t the answer, however. On tours that panned with a higher velocity, users disliked the twitchiness and felt dizzy and out of control. The same problems plagued virtual tours that used an AR interaction model, taking advantage of the mobile device’s gyroscope to track the user’s position in space and move the tour around — all participants decried this experience as nauseating, twitchy, and complained about “gorilla arm” (fatigue from keeping one’s arm extended).

Poor Wayfinding

Another major issue was related to navigation and wayfinding. Users frequently complained that they weren’t sure where they currently were (i.e., which specific room they were viewing) and what other rooms were nearby. While most participants quickly understood the common signifiers that indicated that you could move to a different perspective spot (a circle or arrow superimposed on the ground were reasonably clear), they often didn’t know what room they would end up in if they clicked that icon. Even more to the point, when multiple arrows appeared near one another (as was common on tours hastily adapted to mobile devices), the clarity of the signifier was reduced.

The Google Maps tour of the Van Gogh museum had too many navigation arrows superimposed on top of one another
The mobile virtual tour for the Van Gogh museum in Amsterdam showed too many navigation-arrow signifiers within a small space and confused several participants.  This tour also suffered from a dizzying, shaky experience for mobile users, as it used the mobile device’s position to control the camera. All our study participants decried this aspect.

Some tours solved this wayfinding problem by labeling each arrow with the room it led to. This approach worked well for tours that only had one 360ᵒ  image per room; however, for virtual tours offering a relatively free movement within the space, it would be deeply unwise to have that many labeled arrows. When free movement is enabled, place the text label for room names so that it is not in the way ( e.g.,  on thresholds between rooms).

Zillow's 3D home tours offered clear text labels on the navigation arrows.
Zillow’s 3D tour of homes featured text labels telling users which room they were currently viewing and which rooms they could “walk to” from there. A filmstrip-style navigation component along the bottom of the screen allowed users to teleport to other rooms throughout the house.

In addition, moving to a specific room was often a slow affair. Many virtual tours force users into a linear-access paradigm that mimics physically walking through the space and going from one room to the next. While that’s useful when “getting the vibe” and understanding the flow of a space, if you find yourself on the third floor and wish to jump back to the basement to double-check something, direct navigation would be very helpful.

Some tours enabled users to teleport from one part of a space to another. One approach was a filmstrip-style gallery of labeled images at the bottom of the screen to allow for faster access to specific rooms. Others offered a birds-eye floorplan view or a 3D dollhouse view that allowed for zoomed-out context and fast navigation. Still, these solutions were plagued by buggy, stuttering performance, and users often became disoriented and lost their spatial awareness when zooming in or out of these high-level views. The filmstrip-style navigation was frequently ignored; many of our participants interacted with it only after being asked about it (at the very end of a task or session, so as to not prime the user beforehand).

VIDEO: The dollhouse view offered by Matterport virtual tours elicited an initial delighted response from many users, but was a somewhat confusing method of navigating within a space. Often, when switching back from the dollhouse view to the standard tour view, users would teleport directly up against a wall and needed several seconds to orient themselves spatially.

Opportunities for “Better than Reality”

In many ways, it’s a shame that virtual tours suffer so many basic interaction problems, as there is a huge opportunity for these tours to go beyond an in-person visit. Users frequently wished that the tour would reveal important, contextually relevant details (such as room size or whether it’s been recently updated), like an experienced realtor or tour guide might.

While some tours offered measuring tools, they were plagued with frustrating mode changes, poor icon signifiers, and required a frustrating degree of precision. Measuring tools made the user tap a series of points in the virtual space in order to measure the distance between them; however, to get accurate measurements, users had to tap very precisely — a nearly impossible feat on a mobile device and still very difficult on a desktop computer with a mouse. Users often bemoaned that the tool should provide basic dimensions of the room (e.g., ceiling height, room dimensions) by default, without requiring the user to do all the work!

A virtual tour with a measuring tool was difficult to use
Matterport virtual tour: A study participant attempting to measure the width of the space between windows struggled to position the virtual measuring tape precisely at the start and end points of the virtual measuring tape.

While there’s no completely universal list of information that all users want in each room of a virtual tour, domain-specific themes are easy to identify. Home buyers want to know the size of the room, the number of power outlets, how well sound is insulated, the brand (and age) of appliances, and so forth. Those planning weddings want to know how many people can comfortably fit in a room for a meal, how many people reasonably fit on a dance floor, and what the space looks like both during the day and the evening. People virtually visiting museums and cultural institutions want some information on what they are seeing and why it’s important. For virtual tours to truly be better than reality, these needed details should be provided by default.

Users of these virtual tours craved a guided, expert-led experience — not an overbearing audio narration, but thoughtful details revealed on demand. For example, one virtual-art gallery focused on a single piece of art — rather than a 3D space-based approach, it offered an interactive tour of just one painting, zooming in to various parts and offering expert information about brushstrokes, symbolic themes in the painting, and how the painting related to the artist’s life and other works. This example demonstrates that moving in space is not a precondition of an effective tour — rather, what makes a “tour” valuable to users is rich detail and meaningful context.

A closeup of Frida Kahlo's Self Portrait with Monkey includes additional textual detail about the painting
Google’s Arts and Culture site provides a guided tour of Frida Kahlo’s Self-Portrait with Monkey. The tour focused not on a virtual gallery space, but on high-resolution images of the painting, zooming in to various parts of the work and providing textual guidance on the significance of those aspects that would likely be missed by viewers without art-history expertise. This tour was among the best-received cultural or discretionary-activity tours in our study.

Summary

COVID-19 lockdowns have forced many businesses that rely on access to physical locations to adapt quickly. Unfortunately, most virtual-tour software follows 1990s video-game interaction paradigms and is largely unsatisfying for many users. Still, there are opportunities to provide valuable experiences by focusing on guided, expertise-driven tours, rather than freeform exploration of a 3D space. A virtual tour should be a secondary or tertiary source of information for users after high-quality still photography, well-written descriptions, and even traditional video tours.