Are the traffic numbers for the biggest website on the left, Daily Kos, inflated by about 60% due to an anomaly in SiteMeter that treats each new page view on a high-traffic site as a new visitor to the site? Patrick Ruffini explains:
Earlier today, it was pointed out to me that Kos’s average visit length was all of 2 seconds, suggesting either a coordinated attempt to bomb the site with fake traffic or an extremely low level of engagement on the part of his readers. This seemed as remarkable, given the breathless huffing and puffing about his community platform being a game changer in terms of traffic and audience reach.
So I started to do some digging around his SiteMeter stats and those of other big bloggers.
My source was right. The SiteMeter numbers are indeed fishy. But the reason is far from nefarious: a design flaw in how SiteMeter counts visits that systemically overcounts unique visitors on extremely high traffic blogs like Daily Kos… by a lot.
First of all, I looked at the Detail view showing the last 100 visitors. Overwhelmingly it showed visitors hitting the site only once, with a visit time of zero (you need to hit a second page for it to register any time spent). Contrasted with my traffic, with an average visit length of three minutes, this seemed highly improbable.
Then it hit me: SiteMeter only accounts for the last 100 visitors individually. On a site like Daily Kos, the 100th most recent visitor could have been 15 seconds ago. If you are the 101st most recent visitor and you click on a new page, you are counted as a new unique visitor in SiteMeter’s all important count. On a normal site, this wouldn’t matter, since it’s highly unlikely you’ll stick around long enough to have 100 others show up after you. On a site with hundreds of thousands of page views a day, it’s extremely likely you will.
Other corroborating evidence of this includes the Daily Durations chart and the Page View / Visitor chart by hour. During slow traffic periods (early mornings and weekends) the ratio of page views to uniques returns to more normal levels (up to about 5 to 3). Also there is an odd spike in daily durations up to 3 seconds from 2 that only happens on weekends and is very consistent — a spike that you don’t see on medium traffic blogs. What you see there is a telltale sign of the longer time horizon required for double counting (and triple and so on).
Currently, Kos’s average daily “visit” count stands at about 454,000 and his daily page views at 538,000, a low 1.18 ratio. This number has fed the huge mythology surrounding Kos that he has “half a million” readers a day (I used the number 600,000 as recently as 48 hours ago), while top conservatives are stuck in the muck at about 100,000 to 150,000. These are the numbers used to populate N.Z. Bear’s frequently referenced traffic ranking.
We now know that the only thing we can trust about the SiteMeter numbers are the page views. And from that we can arrive at a more realistic number of daily unique visitors for Daily Kos and other leading blogs.
How so? The best guide we probably have are other netroots blogs like MyDD (stats) and OpenLeft (stats) built on open community platforms. They have low enough traffic that SiteMeter’s inflationary effect is minimal at best. Using Scoop (what Kos uses) and SoapBlox respectively, both have a ratio of about 1.9 page views for every visit (itself a less stringent measure than “unique visitor”). On Red State, where there is likely a little bit of this effect, it’s about 1.8 to 1. On a Wordpress-style blog without diaries, the ratio averages 1.5 page views per visit.
Extrapolating from Kos’ page view number, a more accurate “visitor” number for Kos would be in the neighborhood of 283,000. If Kos is a stickier site than MyDD or OpenLeft (a fair assumption), that number is probably lower. That works out to an artificial inflation in the accepted Daily Kos traffic number of about 60%.
By the way, this is not some theoretical exercise. This is the number SiteMeter would show if they didn’t have this quirk in counting traffic to high-velocity sites. Sites with as more traffic than Kos show a similar skew. That includes gadget and lifestyle blogs like Gizmodo with 2 million page views and a 1 second visit length and Lifehacker at 3 seconds.
To be fair, some conservative blogs probably fall in this boat, though the skew is probably no more than 20-30% in the most extreme case simply because we don’t fall in quite the same traffic league as Kos. Most author-led blogs average about 1.5 page views per visit, and that would peg Michelle Malkin’s actual visit number at about 130,000 (down from 140K). In effect, that means Kos is twice the size of Michelle. That’s not something you’d necessarily want to hang your hat on, but it is dramatically different than the 4 or 5 to 1 number that is in reporters’ minds (and was in mine until tonight). So we still have a hill to climb, but it doesn’t look quite as big as it did 24 hours ago.
Why does this matter? Because if someone uncovered a 60% ratings inflation in Rush Limbaugh’s or Bill O’Reilly’s numbers, we’d never hear the end of it.
If it's true that SiteMeter treats every 100th page view as a new visitor, then there is a serious built-in inaccuracy in counting visitor traffic on high-volume sites. A single visitor can be counted multiple times. SiteMeter will have to address this, or the visitor statistics of high-volume sites will lose all credibility.
Update: Allahpundit at HotAir notes the question raised by Ruffini but argues that perhaps SiteMeter's figures are roughly accurate because surely SiteMeter would have noticed the problem in its software and because Alexa's traffic figures show Daily Kos traffic to be higher than that of the top conservative blogs:
. . . [SiteMeter's] been the industry standard for big blogs too since 2002. I find it hard to believe they wouldn’t have either corrected this glitch yet or at least noted it somewhere for the benefit of new customers. In fact, compare SM’s free basic service to the premium service: for $6.95 a month, SM will provide details for the last 4,000 visits, not the last 100. Which means, if Ruffini’s glitch theory is correct, that a premium account will result in a much lower (and more accurate) visit count than a free, basic account. That’s an odd feature to include in your pay service, especially when bloggers depend upon higher visit counts for ad rates.
He makes a good point, though, about how visits shrink to impossibly small durations according to SM’s metrics as sites get bigger. I’m not sure how to explain that other than the way he does. I’ll leave you with this — Dan Riehl’s side by side comparisons of dKos, Instapundit, National Review, and MM using the independent (and imperfect) metrics of Alexa. As you’ll see from his last graph, dKos’s “reach” — defined as the percentage of global Internet users who visit a site according to Alexa’s best guesstimates — is roughly three times that of MM and InstaGlenn. If Ruffini’s SM glitch theory is correct, MM’s “real” traffic numbers are on the order of 130,000 visits a day. That suggests, but of course doesn’t prove, that Kos’s traffic is quite a bit higher than the “real” number of 283,000 that Ruffini attributes to him. Here’s another graph of dKos and Instapundit head to head. Glenn’s “real” traffic would probably be 150,000 or so according to Ruffini’s theory. Kos is still way up . . . .
One commenter at HotAir lorien1973, remarks:
Per Alexa. This is a poor estimation as well. Alexa only measures traffic based upon who has the toolbar installed. And the only group who has it installed is webmasters.
I agree that Alexa stats are even more squirrely and less transparent than those of SiteMeter. Alexa does not even disclose total visitor counts that it relies upon, probably because those counts would reveal how many visitors it is missing.
Besides, the fact that Alexa's statistics are derived from use of its toolbar, which is popular primarily with techies, guarantees that Alexa's statistics will be badly distorted in favor of whichever websites happen to be favored by that community. Besides, Alexa's visit tracking system may contain some of the same errors as SiteMeter's. We don't know because Alexa is not transparent.
SiteMeter and Alexa's imperfect "reach" figures may be the best tools available so far, but if so, that does not mean they are accurate in all situations.
Patrick Ruffini has raised an important question that SiteMeter needs to address -- either by showing that its tracking system does not double- or triple-count visitors to high-volume websites during the same visit based on each page view, or by upgrading its software so that it can more reliably track traffic on high-volume websites.
Update: Another commenter, "pedestrian," adds:
Looks like right now half of Hotair’s entry pages are coming internally. It’s over 90% for dailykos, so yeah, their visit figures are meaningless.
Comments