• Follow Curiously Persistent on WordPress.com
  • About the blog

    This is the personal blog of Simon Kendrick and covers my interests in media, technology and popular culture. All opinions expressed are my own and may not be representative of past or present employers
  • Subscribe

  • Meta

Data should be used as evidence and not illustration

I read the Guardian article on journalist’s struggles with “data literacy” with interest. The piece concentrates on inaccurate reporting through a lack of understanding of numbers, and the context around them. “Honest mistakes”, of a sort.

Taken more cynically, it is an example of a fallacy that I see regularly in many different  disciplines (I’m loath to call it a trend as, for all I know, this could be a long-standing problem) – fitting data around a pre-constructed narrative, rather than deducing the main story from the available information.

This is dangerous. It reduces data to be nothing more than anecdotal support for our subjective viewpoints. While Steve Jobs may have had a skill for telling people what they really wanted, he is an exception rather than the rule. We as human beings are flawed, biased and incapable of objectivity.

Given the complexity of our surroundings, we will (probably) never fully understand how everything fits together – this article from Jonah Lehrer on the problems with the reductionist scientific method is fascinating. However, many of us can certainly act with more critical acumen that we currently do.

This is as incumbent on the audience as it is the communicator – as MG Siegler recently wrote in relation to his field of technology journalism, “most of what is written… is bullshit”, and readers should utilise more caution when taking news as given.

Whether it is due to time pressures, lack of skills, laziness, pressure to delivery a specific outcome of otherwise, we need to avoid this trap and – to the best of our abilities – let our conclusions or recommendations emerge from the available data, rather than simply use it to illustrate our subjective biases.

While I am a (now no more than an occasional) blogger, I am not a journalist and so I’ll limit my potential criticisms of that field. However, I am a researcher that has at various points worked closely with many other disciplines (some data-orientated, some editorial, some creative), and I see this fundamental problem reoccurring in a variety of contexts.

When collating evidence, the best means to ensure its veracity is to collect it yourself – in my situation, that would be to conduct primary research and to meet the various quality standards that would ensure a reliable methodology, and coherent conclusions

Primary research isn’t realistic in many cases, due to limited levels of time, money and skills. As such, we rely on collating existing data sources. This interpretation of secondary research is where I believe the problem of illustration above evidence is most likely to occur.

There are two stages that can help overcome this – critical evaluation of sources, and counterfactual hypotheses.

To critically evaluate data sources, I’ve created a CRAP sheet mnemonic that can help filter the unusable data from the trustworthy:

  • Communication – does the interpretation support the actual data upon scrutiny? For instance, people have been quick to cite Pinterest’s UK skew to male users as a real difference in culture between the UK and US, rather than entertain the notion that UK use is still constrained to the early adopting tech community, whereas US use is – marginally – more mature and has diffused outwards
  • Recency – when was the data created (and not when was it communicated)? For instance, I’d try to avoid quoting 2010 research into iPads since tablets are a nascent and fast-moving industry. Data into underlying human motivations is likely to have a longer shelf-life. This is why that despite the accolades and endorsements, I’m loath to cite this online word of mouth article because it is from 2004 – before both Twitter and Facebook
  • Audience – who is the data among? Would data among US C-suite executives be analogous to UK business owners? Also, some companies specialising in PR research have been notoriously bad at claiming a representative adult audience, when in reality they are usually a self-selecting sub-sample
  • Provenance – where did the data originally come from? In the same way as students are discouraged from citing Wikipedia, we should go to the original source of the data to discover where the data came from, and for what purpose. For instance, data from a lobby group re-affirming their position is unlikely to be the most reliable. It also helps us escape from the echo chamber, where myth can quickly become fact.

Counterfactual hypotheses are the equivalent of control experiments – could arguments or conclusions still be true with the absence of key variables? We should look for conflicting conclusions within our evidence, to see if they can be justified with the same level of certainty.  This method is fairly limited – since we are ultimately constrained by our own viewpoints. Nevertheless, it offers at least some challenge to our pre-existing notions of what is and what isn’t correct.

Data literacy is an important skill to have – not least because, as Neil Perkin has previously written about, it is only the first step on the DIKW hierarchy towards wisdom. While Sturgeon’s Law might apply to existing data, we need to be more robust in our methods, and critical in our judgements.  (I appreciate the irony of citing an anecdotal phenomenon)

It is a planner trope that presentations should contain selective quotes to inspire or frame an argument, and I’ve written in the past about how easily these can contradict one another. A framing device is one thing; a tenet of an argument is another. As such, it is imperative that we use data as evidence and not as illustration.


Image credit: http://www.flickr.com/photos/etringita/854298772/

Avoiding insights

I really don’t like using the word “insight”.

As I wrote here, the word is hideously overused. Rather than being reserved for hidden or complex knowledge, it is used to describe any observation, analysis or piece of intelligence.

And so I’ve avoided using it as much as possible. In an earlier tweet, I referred to the Mobile Insights Conference that I’ve booked to attend as the MRS Mobile thing. And I even apologised for my colleague (well, technically, employer) littering our Brandheld mobile internet presentation with the word.

But this is irrational. I shouldn’t avoid it, if it is the correct word to use. After all, substituting it for words like understanding, knowledge or evidence might be correct in some instances, but not all.

Does it really matter? After all, isn’t a word just a word? As someone once said, “What’s in a name? That which we call a rose by any other name would smell as sweet“.

But he’s talking complete rubbish. Because words do matter. They cloud our perceptions. It is why brands, and brand names, are so important. And why blind taste tests give different results to those that are open.

In fact, this emotional bond we have with words has undoubtedly contributed to my disdain. And this should stop. So I vow to start reusing the word insight, when it is appropriate.

But when is it appropriate? I’ve already said that an insight is hidden and complex, but then so is Thomas Pynchon and he is not an insight.

In the book Creating Market Insight by Drs Brian Smith and Paul Raspin, an insight is described as a form of knowledge. Knowledge itself is distinct from information and data

  • Data is something that has no meaning
  • Information is data with meaning and description, and gives data its context
  • Knowledge is organised and structured, and draws upon multiple pieces of information

In some respects it is similar to the DIKW model that Neil Perkin recently talked about, with insight replacing wisdom.

However, in this model – which was created in reference to marketing strategy – an insight is a form of knowledge that conforms to the VRIO framework.

  • Valuable – it informs or  enables actions that are valued. It is in relation to change rather than maintenance
  • Rare – it is not shared, or cannot be used, by competitors
  • Inimitable – where knowledge cannot be copied profitably within one planning cycle
  • Organisationally aligned – it can be acted upon within a reasonable amount of change

This form of knowledge operates across three dimensions. It can be

  • Narrow or broad
  • Continuous or discontinuous
  • Transient or lasting

How often do these factors apply to supposed insights? Are these amazing discoveries really rare and inimitable, and can they really create value with minimal need for change? Perhaps, but often not.

And Insight departments are either amazingly talented at uncovering these unique pieces of wisdom, or they are overselling their function somewhat.

When I’m analysing a piece of privately commissioned work, a finding could be considered rare and possibly inimitable (though it could be easily discovered independently, since we don’t use black box “magic formula” methodologies). But while it is hopefully interesting, it won’t always be valuable and actionable.

But if it is, I shall call it an insight.


Image credit: http://www.flickr.com/photos/sea-turtle/2556613938/

Reblog this post [with Zemanta]