Showing posts with label models. Show all posts
Showing posts with label models. Show all posts

Tuesday, 20 March 2018

Wisdom of Crowds


‘The Wisdom of Crowds’ is the name of a book by James Surowiecki in which he discusses the idea that in certain situations aggregating the knowledge from a random crowd of people could get to a better answer to a problem than any single individual could – even an expert individual. It wasn’t a new idea – according to Wikipedia (never a reliable source, so not a good ‘crowd’ example) that in 1907 Francis Galton noted that a crowd at a county fair correctly guessed the weight of an ox when you took the average of all the guesses.  Surowiecki’s book certainly popularised the term – I even used it in the title of a paper on drug repurposing: ‘The wisdom of crowds and the repurposing of artesunate as ananticancer drug’ – and it has become something of a standard feature of many books and courses in machine learning and data science.

The Nobel prize-winning economist and political scientist Frederich von Hayek didn’t, as far as I know, use the term but the idea was central to his thinking. He saw the price/market system as the wisdom of the crowds in action. He saw the society as a complex and self-organised system, with distributed decision making and dispersed knowledge as they key driving forces. Trying to control an economy from the top down is impossible without access to all that knowledge - knowledge that we are often not even explicitly aware that we have.

I’ve often wondered though whether it really works in practice, or was it really the case that yet again the world is far too complex and messy for even this simple (and surprising) idea to work. At the weekend I finally managed to see a real world example. In the context of some fundraising for the George Pantziarka TP53 Trust (the UK charity that supports people with Li Fraumeni Syndrome), we attended the modern equivalent of Galton’s county fair – a suburban Farmer’s market in south-west London. We didn’t have an ox to spare, so in our case the crowd had to correctly guess the number of chocolate Easter eggs to win the prize (see below, we’ll skate over the health effects of eating all of those eggs…).



This was my chance to get my hands on a real world data set. Unfortunately the weekend coincided with a blizzard, so turn-out was low at the market and I was worried that the dataset wouldn’t be sufficient to show the effect. In the end we had 66 entries – and the correct answer was 145 eggs. The answers were all over the place, with a low of 50 and a maximum of 376 (see scatter chart below – correct answer in red). The lucky winner got close with an answer of 143.



So how wise was our crowd of 66? The average of the entire data set was 144.1 – which is closer than the winning entry. I have to admit I was surprised at just how close that is. Even more surprising is how quickly the average converged to the correct answer. The chart below shows the cumulative moving average converging close to the right answer within 15 guesses. That’s fast.


Was that speed of convergence just a fluke? When the dataset is reversed what happens? The same thing – the cumulative moving average gets close to the correct average incredibly quickly, even though it starts off with some wildcard answers.


Although this idea might be old hat – I for one am still impressed at these results. Although the applications for this idea are limited – it would be great to be able to harness this sort of thing to solve something a bit more meaningful than the size of an ox or the number of chocolate eggs. I also find the democratic nature of this result incredibly satisfying.

Wednesday, 26 April 2017

Myc, Models and Tumour Growth

When I wrote the NEATG model of tumour growth, published in the journal PeerJ (https://peerj.com/articles/2176/), I focused on the behaviour of individual cells and on the role of cell competition and cell death in tumour growth. All models have to simplify and abstract, and the NEATG model does exactly that. The model steered clear of the molecular basis for the behaviour of the cells – my working assumption was that cells produce soluble factors and signalling proteins that mediate their cell-cell and cell-tissue interactions. It was one of the reasons I called the model NEATG (Non-physiological Evolutionary Algorithm for Tumour Growth – though as Sebastien Benzekry pointed out to me, the fact that the model has cells and tissues already makes it fairly physiological… ).

Despite the level of abstraction, the model clearly does show cellular behaviours that mimic aspects of tumour growth and response to cytotoxic chemotherapy.  A surprise for me was that the model showed that accelerated tumour regrowth following chemotherapy is driven by cell competition and the levels of cell death. I spent time looking at the research literature on these topics and found that my results were actually in line with clinically relevant phenomena – which is what makes the model interesting.

A recent paper from a group of researchers from the University of Bologna entitled MYC, CellCompetition, and Cell Death in Cancer: The Inseparable Triad  (available open access from the journal genes – it’s well worth a read), casts some interesting light on the topic. C-Myc is one of the most important of the master genes involved in cell cycle progression and tissue growth. It’s a transcription factor, which means it regulates the activity of other genes, and is often switched on permanently in tumours rather than coming on and off as required. It’s widely associated with a whole range of different cancers.

In this paper the authors review recent evidence from Drosophilia (fruit fly) models of cancer and how they are implicated in cell growth, cell death and cell competition. They outline the way that competition selects for cells that are ‘fittest’ and that less fit cells are effectively killed off – in just the way that they do in software in my model. The evidence that they outline suggests that one of the drivers for the behaviour of these cells is c-Myc, making it an important component at the physiological level missing from NEATG.The authors themselves make note of the results from the NEATG model:

Our findings suggest that CC [cell competition] is an innate process governing both cancer initiation and progression, where cell death fuels the clonal expansion of the fittest cells in the context. CC and apoptosis thus appear to be strictly linked one another, and emerge as fundamental cancer drivers also in a computational model of tumour growth, where several parameters of malignancy such as intra-tumour heterogeneity and accelerated repopulation have been taken into account.

For the next stage of work on this model I will be investigating the metabolic impact of cell growth in tumour growth, and the impact that cell death has on this. My prediction is that the model will show that populations of ‘super-feeders’ will emerge during tumour growth – and that chemotherapy helps to select for these populations of cells. However, as this stage I don’t really know that that’s what happens – I have to wait and see what the data tells me…

In the meantime I’ll finish with some final words from the team at Bologna :

Cells cooperate to build an organ and, in a similar way, they cooperate to build a cancer. Although the contexts are impressively distant, MYC-mediated cell competition seems to be at work in both cases with the same basic, sequential elements: cell–cell disparity in MYC contents, death of the cells with lower MYC levels, and proliferation of the cells with higher MYC levels. This stereotypical module shapes organ development and, possibly, cancer evolution. In growing tumours, an excess of dying cells is known to contribute to mass expansion, but the implication of MYC-mediated cell competition in this cancer trait has just begun to be investigated. Further research is warranted on the intricate “life and death” signals exchanged by confronting cell populations within the cancer community.


Monday, 14 December 2015

NEATG - A software model of cancer

For a huge chunk of my working life I have built computer models which were used to assess operational activities in different industries. The combination of mathematics and software can provide enormous power to help understand and assess complex processes. My doctorate put these skills to good use in that I used software implementations of evolutionary processes to build a system that evolved mathematical models which could validate the correctness, or otherwise, of large data sets. In plain English I used genetic algorithms to discover mathematical models which could pick out incorrect data values in large volumes of data. Think of a system that could take the largest Excel spreadsheets and automatically flag those rows of data which were most likely to be in error – all without knowing what the spreadsheet data represented or who had put it together or why.

Of course cancer is the ultimate in evolutionary systems – if you wanted to design a system to illustrate the evolution at work you’d come up with something pretty much like it. When we look at cancer and see that some treatments have fantastic initial responses, with tumours shrinking away to almost nothing, followed by a rebound in which the cancer comes back more aggressive and resistant to the treatment then we’re seeing evolution at work.

Given my background in computer modelling and my current work in oncology it should be no surprise that I’ve worked on a software model of tumour growth. I’ve called it NEATG – for Non-physiological Evolutionary Algorithm for Tumour Growth. It’s a computational model – it’s about algorithms rather than about trying to recreate in software the vast complexities and details of cells, proteins, signals and pathways. Although it’s a simple model by design, it does illustrate some interesting behaviour that brings to mind the behaviour of real tumour growth.

Tumour growth in NEATG

For example, the NEATG system can model the growth of a tumour mass (in two dimensions), it can model the rise of genetically different sub-populations of cancer cells, and it can model different interventions such as chemotherapy or nutrient deprivation. What is more it displays emergent behaviour – such as a more aggressive growth pattern following the cessation of treatment. This is behaviour that emerges naturally from the interactions between cells and tissues, not behaviour that has been explicitly programmed into the system as a set of predefined rules.

For now NEATG is a tool that can be used to explore different algorithmic scenarios – you can play try out different thought experiments to see what happens. It’s good for thinking about some of the most fundamental aspects of cancer without getting bogged down in the molecular biology. For example, while most people think of cancer as primarily a disease of disordered genes – a view known as the ‘somatic mutation theory’ of cancer – there is an alternative theory called the ‘tissue organisation field theory’ of cancer. In this theory disordered genes are more of a by-product than a cause of cancer, and it places more emphasis at the disordered tissue environment. Simplistically we can ask: is it the delinquent cell or the bad neighbourhood that causes cancer? This is a good question to explore using a suitable software model – and I hope that NEATG can be applied to this.

While it’s still early days for this piece of work, I have written a paper on it which is available as a preprint (i.e. prior to peer review) at PeerJ. If you’re interested please take a look.