Whenever I show double-logarithmic charts in my usability seminars, I see the audience members' eyes glaze over. People don't like anything but the simplest data visualizations, and I've certainly learned my lessons from the feedback sheets and scaled back on the amount of statistics I present.

Still, I can't help myself: there's data underlying the usability guidelines, and I have to show some of it. To understand traffic patterns found by Web analytics, for example, some of those hated advanced-visualization plots are sadly necessary. Without them, you simply can't tell what's going on.

As an example, consider the following linear graph of my log file analysis of how many visitors each page gets on a given website:

Traffic diagram with linear axes

This linear graph shows what looks like the classic "long tail" distribution (which is really Zipf's Law). And indeed, it almost is. The difference between theory and practice becomes clear, however, when we plot the same data on logarithmic scales:

Traffic diagram with linear axes

It's now clear that we have a drooping tail: the site simply doesn't have enough content to supply the predicted demand at the low end.

Without this fancy log-log plot, we would have never seen the site's potential for increasing traffic by adding large amounts of low-volume content. I'm amazed at how often articles analyzing Web traffic or "long tail"-type businesses use linear plots that fail to show what's really going on.

To compare high-volume and low-volume events in the same diagram, it's typically best to use a logarithmic plot. (If you're using Excel to plot your charts, you can get logarithmic scales by simply double-clicking each axis, choosing the Scale tab in the resulting Format Axis dialog box, and checking the box for Logarithmic scale.)

In addition to the drooping tail, my original analysis also found a hump on the traffic plot for search queries -- a different phenomenon that also only shows up on a log-log chart.

Wag the Drooping Tail

So, what would happen if our sample site could wag its traffic tail up to the straight line representing the traffic potential the theory predicts?

In my analysis, current traffic with 1,000 pages was 2.6 M pageviews over an 8-week period. With 260,000 pages, the site could expect traffic to increase to 4.8 M pageviews over the same period. That is, the 259,000 new low-traffic pages would get 2.2 M pageviews, for an average of 9 views per page.

Now, if we extend the 8-week period to a full year, the total traffic would almost double -- from 16.9 M to 32.2 M pageviews -- giving each new page an average of 58 views.

What's the value of 58 pageviews?

Over the last several years, Yahoo! has made between 0.2 and 0.4 cents per non-search pageview. However, I believe that Internet advertising is over-hyped and that advertisers are deluding themselves into overpaying. In the long term, non-search advertising's value will drop to 0.1 cents or less per page.

So, at the expected long-term value of 0.1 cents per view, 58 pageviews have a value of about 6 cents. If we assume the new pages can attract traffic for five years, and then discount future cash flow by 10% per year, the present value of each new page is 24 cents.

Not much. But we're expecting to add 259,000 pages, so the total value would be $62,000.

It sounds like a nice sum -- but could the site create 259,000 new pages for $62K? Obviously not, assuming the employees creating the pages earned salaries higher than that of the average ant.

The only feasible approach is that chosen by many sites these days: to con users into contributing content for free. However, doing so requires that sites develop a system for user contributions, which (if done correctly) requires user testing and other quality assurance before being fielded. Given that the system's features aren't particularly advanced, our sample site could probably develop it for less than $62K. But it wouldn't be free.

(Update: Chris Anderson found a drooping tail for the popularity of movies. The drooping tail shape may be more common than previously expected. Maybe now people will go back and reanalyze their long tail statistics with the correct diagrams :-)

Analysis Outcome?

It probably wouldn't pay for our sample site to take advantage of the opportunity that log analysis revealed. The long tail's end pays for aggregators who get their products from others, but companies who must develop their own are usually better served by staying away from the full tail.

That said, pursuing the tail's end might be valuable if a site meets one of two conditions: it has a better way than low-value ads to monetize traffic, or it has so many users that the total income would be substantially more than the cost of developing the new functionality.

In any case, you should certainly run through such exploratory ROI scenarios for your own site. To do so, you need correct data analysis and this typically requires more advanced visualizations than you see in most places. It's here that logarithmic plots deserve a chance -- despite their intimidating name.

(To avoid misunderstandings: you obviously shouldn't show log charts in websites targeted at a broad consumer audience. They're for internal use only -- or for websites like mine that target an intellectual audience.)