• Follow Curiously Persistent on WordPress.com
  • About the blog

    This is the personal blog of Simon Kendrick and covers my interests in media, technology and popular culture. All opinions expressed are my own and may not be representative of past or present employers
  • Subscribe

  • Meta

Meaningless statistics in the World Cup

As most people are, I’m currently enjoying watching the games during the football World Cup (though this enjoyment is tempered by the fairly poor quality of the matches and the sounds of the vuvuzela).

Occurring just once every four years, there are relatively few instances where statistical norms or trends can be deployed. For instance, even though the World Cup winner has alternated between Europe and South America since 1962, that only represents only 12 instances.

As I’ve mentioned several times before, one of the reasons I follow baseball is the ability to break down every component of every play. This creates some very powerful statistical analysis. To take an example, check out the Fangraphs page for New York Mets 3rd baseman David Wright. Yet, one of the phrases most regularly uttered in the Fangraphs blog is “small sample size” (closely followed by “regression to mean”).

The fluid nature of football combined with fewer games prevents this level of analytical rigour. Which is fine, except that commentators or analysts treat all stats as the same. It might be fine for Opta Joe to tweet something like “1m50s – Cacau’s goal was the fastest by a German substitute at the #worldcup finals beating Uwe Reinders v Chile (2 mins 26) in 1982. Swift.”

But it is not OK for Gabby Logan (OK, I know I should be wary of the source) to say that England shouldn’t be too worried by an opening draw, since they had a similar result in their two most successful World Cup runs (1966 and 1990). Aside from the small sample, this selective look at the past is little more than a confirmation bias. For instance, England also drew the opening game of the 2002 World Cup, yet this wasn’t mentioned.

I like seeing statistics used in different contexts, but I don’t like seeing them misused. While I shouldn’t necessary expect ITV Sport to employ a significance tester alongside its researcher(s), it would be nice if we would distinguish between anecdotal or illustrative information, and statistical probability.

Otherwise, there is no difference between gut or superstition. At least the Red Sox had some numbers to go behind their curse. As it stands, there is no difference between Pele saying (every time) “This is the year an African nation will win the World Cup” to me saying “Every time North Korea qualify for the World Cup, England win it”.


Image credit: http://www.flickr.com/photos/moonrising/4695377175/


3 Responses

  1. http://www.guardian.co.uk/sport/blog/2010/may/12/the-question-important-possession

    Interesting analysis does happen, you just don’t hear an awful lot about it. The cult of Jonathan Wilson and the success of books like his Inverting the Pyramid is starting to bring it to the fore though.

    Of course, this article also suggests that even where you have over 9000 observations, if you get the analysis wrong, you can still come to a conclusion every bit as inaccurate as the lovely Gabby may have done based on 2.

  2. I know. Football is more about randomness and unpredictability – my extrapolating Gabby’s platitudes is ultimately pointless but it still grates

  3. Humans are about randomness and unpredictability as well – we still do a lot of stats on them!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: