Tuesday 20 March 2018

Wisdom of Crowds

‘The Wisdom of Crowds’ is the name of a book by James Surowiecki in which he discusses the idea that in certain situations aggregating the knowledge from a random crowd of people could get to a better answer to a problem than any single individual could – even an expert individual. It wasn’t a new idea – according to Wikipedia (never a reliable source, so not a good ‘crowd’ example) that in 1907 Francis Galton noted that a crowd at a county fair correctly guessed the weight of an ox when you took the average of all the guesses.  Surowiecki’s book certainly popularised the term – I even used it in the title of a paper on drug repurposing: ‘The wisdom of crowds and the repurposing of artesunate as ananticancer drug’ – and it has become something of a standard feature of many books and courses in machine learning and data science.

The Nobel prize-winning economist and political scientist Frederich von Hayek didn’t, as far as I know, use the term but the idea was central to his thinking. He saw the price/market system as the wisdom of the crowds in action. He saw the society as a complex and self-organised system, with distributed decision making and dispersed knowledge as they key driving forces. Trying to control an economy from the top down is impossible without access to all that knowledge - knowledge that we are often not even explicitly aware that we have.

I’ve often wondered though whether it really works in practice, or was it really the case that yet again the world is far too complex and messy for even this simple (and surprising) idea to work. At the weekend I finally managed to see a real world example. In the context of some fundraising for the George Pantziarka TP53 Trust (the UK charity that supports people with Li Fraumeni Syndrome), we attended the modern equivalent of Galton’s county fair – a suburban Farmer’s market in south-west London. We didn’t have an ox to spare, so in our case the crowd had to correctly guess the number of chocolate Easter eggs to win the prize (see below, we’ll skate over the health effects of eating all of those eggs…).

This was my chance to get my hands on a real world data set. Unfortunately the weekend coincided with a blizzard, so turn-out was low at the market and I was worried that the dataset wouldn’t be sufficient to show the effect. In the end we had 66 entries – and the correct answer was 145 eggs. The answers were all over the place, with a low of 50 and a maximum of 376 (see scatter chart below – correct answer in red). The lucky winner got close with an answer of 143.

So how wise was our crowd of 66? The average of the entire data set was 144.1 – which is closer than the winning entry. I have to admit I was surprised at just how close that is. Even more surprising is how quickly the average converged to the correct answer. The chart below shows the cumulative moving average converging close to the right answer within 15 guesses. That’s fast.

Was that speed of convergence just a fluke? When the dataset is reversed what happens? The same thing – the cumulative moving average gets close to the correct average incredibly quickly, even though it starts off with some wildcard answers.

Although this idea might be old hat – I for one am still impressed at these results. Although the applications for this idea are limited – it would be great to be able to harness this sort of thing to solve something a bit more meaningful than the size of an ox or the number of chocolate eggs. I also find the democratic nature of this result incredibly satisfying.