We've known since 1997 that the Web follows a Zipf distribution for website popularity as expressed in traffic and incoming links. Simply stated, big sites get disproportionally more traffic than smaller sites. A site ranked number 100, for example, will get 10 times more traffic than a site ranked number 1,000. (In general, site N gets M/N times the traffic of site M .)

Six years later the same power law was shown to hold for the genre of websites known as weblogs. This isn't surprising, since when you subset a Zipf distribution you get a new one.

As an example, let's look at a hypothetical situation involving websites on a specialized topic, and say that these sites account for one out of every 20,000 websites. This amounts to 2,000 websites out of the 40 million currently in existence. Assuming that these 2,000 sites are evenly distributed among all 40 million websites, traffic for the five largest sites would be as follows:

 

Site rank
on entire Web
Page views
per year
Site rank
within specialized topic
#20,000 10,000,000 Largest
#40,000 5,000,000 2nd largest
#60,000 3,333,333 3rd largest
#80,000 2,500,000 4th largest
#100,000 2,000,000 5th largest

 

As it happens, my own website gets 10 million page views per year, and is probably the world's most popular usability website. Since the topic probably has about 2,000 sites devoted to it, this table might describe usability websites, though it might easily describe other specialized topics as well.

Considering that the Web as a whole will have about 4 trillion page views this year, the table's sites might seem irrelevant with their pitiful millions of page views. But within their niche they dominate. A site that ranks as number 100,000 in the overall Web universe would still be the fifth largest within its niche: big enough to throw some weight around.

Furthermore, niches have their own niches. Focusing on a highly targeted subtopic can make even a tiny site with a few hundred thousand page views stand out.

Specialization Means Nobody Rules the Web

Because of the debate about ever-more centralized mass media and changing FCC regulations in the United States, the question of whether the Web is also becoming centralized has become a hot potato. However, the Web is not a mass medium. It's not broadcast. The Web is on-demand, driven by each customer's specialized need in each moment.

In a recent New York Times op-ed piece titled "More News, Less Diversity" Matthew Hindman and Kenneth Neil Cukier argued that the FCC is wrong to view the Web as a diverse information environment, since most traffic accrues to the biggest sites.

Hindman and Cukier mention that two-thirds of all hyperlinks point to the ten most popular sites of the 13,000 that cover gun control, and that the top ten sites on capital punishment receive 63% of the topic's links. Nonetheless, considering the Web as a whole, diversity is still ensured.

The question here is not whether some topical sites are bigger than others. They obviously are, given the power law for the Web and its subsets. The question is whether the same few sites would always dominate, regardless of a user's goal. As a sidebar looking more closely at Hindman and Cukier's examples  demonstrates, that is clearly not the case:

  • There is zero overlap between the top sites for the two topics they mention.
  • Looking at more specialized sub-issues or slightly rephrasing the questions leads users to yet other sites.
  • In a different domain, the main sites for economic issues are almost completely different from the main sites for crime -related issues.

In total, searches on seven different topics identified 59 different sites among the 70 entries on the search listing's first page: only 16% of cases were multiple listings of the same site. Not exactly indicative of a few sources monopolizing Internet debate.

All of these searches were performed on Google, which is currently the largest search engine. Still, there are many other search engines with big market shares.

On Microsoft's MSN search engine, only two of the top ten gun control hits were included on Google's list. And none of the nine sites in MSN's top ten capital punishment sites were included in Google's list (Amnesty International appeared twice on MSN, but not on Google). More proof that sites might be big in some contexts, but are rarely big everywhere.

Search Engine Ads Further Promote Diversity

MSN includes a sponsored Encarta encyclopedia link, which I did not count. Similarly, Google and many other search engines let anyone buy sponsored placement in small text boxes, which are an incredibly efficient form of advertising.

Any advocacy group can buy such a text ad and attract actively interested people to its website for around five cents per user. This is drastically less than the cost of printing and distributing pamphlets .

So even advertising, this most commercial Web element, supports diversity and allows small groups a greater voice than we find in the physical world.

Small is Beautiful (and Profitable) on the Web

It's true that a few websites will get most of the traffic for any given question, because users rarely bother to go beyond the first page of hits. Still, there are many questions, and each question leads to a different list of top sites. Taking the Web as a whole, numerous specialized sites stand ready to provide their perspective on different issues, and these sites do get substantial exposure and traffic within their areas of expertise.

Small sites have two huge advantages over big sites: there are many more of them and they are more specialized and thus more targeted. Small sites speak directly to the specific needs and interests of a committed user community, and thus have much higher value per page view. A site on growing blueberries can be a must-read service for people who farm them, and thus of immense value as a place to promote blueberry-farming equipment.

Diversity is power on the Web. Big sites may be bigger, but smaller sites will keep scoring higher for specialized topics, both in terms of their connections with users and in terms of each visit's commercial value.