Popper Versus Bacon

Popper Versus Bacon

Peter Coveney [5.7.15]
Introduction by:
Peter Coveney

People have to go around measuring things. There's no escape from that for most of that type of work. There's a deep relationship between the two. No one's going to come up with a model that works without going and comparing with experiment. But it is the intelligent use of experimental measurements that we're after there because that goes to this concept of Bayesian methods. I will perform the right number of experiments to make measurements of, say, the time series evolution of a given set of proteins. From those data, when things are varying in time, I can map that on to my deterministic Popperian model and infer what's the most likely value of all the parameters that would be Popperian ones that would fit into the model. It's an intelligent interaction between them that's necessary in many complicated situations.

by John Brockman

There’s a massive clash of philosophies at the heart of modern science.  One philosophy, called  Baconianism after Sir Francis Bacon, neglects theoretical underpinning and says just make observations, collect data, and interrogate them. This approach is widespread in modern biology and medicine, where it’s often called informatics.  But there’s a quite different philosophy, traditionally used in physics, formulated by another British Knight, Sir Karl Popper. In this approach, we make predictions from models and we test them, then iterate our theories.

 In modern medicine you might find it strange that many people don’t think in theoretical terms. It's a shock to many physical scientists when they encounter this attitude, particularly when it is accompanied by a conflation of correlation with causation. Meanwhile, in physics, it is extremely hard to go from modeling simple situations consisting of a handful of particles to the complexity of the real world, and to combine theories that work at different levels, such as macroscopic theories (where there is an arrow of time) and microscopic ones (where theories are indifferent to the direction of time).

At University College London, physical chemist Peter Coveney, is using theory, modeling and supercomputing to predict material properties from basic chemical information, and to mash up biological knowledge at a range of levels, from biomolecules to organs, into timely and predictive clinical information to help doctors. In doing this, he is testing a novel way to blend the Baconian and Popperian approaches and have already had some success when it comes to personalized medicine and predicting the properties of next generation composites.


PETER COVENEY holds a chair in Physical Chemistry, and is director of the Centre for Computational Science at University College London and co-author, with Roger Highfield, of The Arrow of Time and Frontiers of Complexity. Peter Coveney's Edge Bio Page.


All my life, the problems that have been of interest to me are the ones that connect science into a whole. Do we have an integrated theory of science, as it were, or is science broken into many separate parts that don't add up? We have, from observation, the expectation we live in a single universe, so we'd expect consistency, and that's what leads you to demand this in properties of theories that describe the world we live in.                       

If you look at the way we categorize our theories, there are different ways of analyzing them. Some lie within the domain of physics or even applied mathematics. We have chemistry, biology, engineering; these usually are regarded as separate disciplines and historically have comparatively little to do with one another. It's not a surprise when you ask questions about who's doing what, in scientific terms, that answers my question of a unified theory of knowledge, so to speak, that it's rather fragmented still today.

We have people who explore the extremely large-scale—you might call that cosmology—or the very small scales. Again, that's a physical domain—subatomic theories, going down to extremely short length in timescales. We can have problems that relate to life, such as where life has come from on this planet, but we have plenty of reasons to suspect that it's probably much more widespread than that, and then questions are posed in rather different ways.                                

In modern biology and medicine today you would find most people not even trying to think in theoretical terms. It's quite a shock to many physical scientists when they encounter this. It's a funny clash between two philosophies of science that have been around for overall 500 years or so. What we call "Baconian theory" says, don't worry about a theoretical underpinning, just make observations, collect data, and interrogate the data. This Baconianism, as it's come to be known, is very widespread in modern biology and medicine. It's sometimes also called informatics today.                                 

We have the model of philosophy of science, which is the physicists’ one, formulated in a nice and concise way by Sir Karl Popper. These are two curious Knights of the British Realm, in fact, whose descriptions of the way science works are at complete odds with one another. A Popperian theory is one where it's fundamentally mathematical, and you can describe reality in terms that are somehow out there, objective. We make predictions from these theories and models and we test them. If the agreement isn't good enough, it could be that the experimental observations are wrong. Every now and then we have to change the theory.                                 

If you practice, as many of our biological colleagues do today, a Baconian approach, there isn't an underpinning theory. There's nothing that needs to go wrong, it's just a necessary requirement to keep on collecting data. Once you become influenced by these things and you want to understand, in the modern context, tangible things like how I can make sense of the human as a scientific entity, can I predict things about the way a human's life is going to evolve, which methodology am I going to choose? I'm more physically based. I'd like a Popperian theory, but I rapidly run up against people who don't relate to that. We have a massive clash of doctrines at the heart of these descriptions.

There’s definitely a widespread movement in scientific circles—in life and medical sciences—which is about just capturing data, don't worry about theoretical underpinnings. Indeed, some people would deny there is a value to having a theory. The idea is just continue to collect data. At some point though—you may understand what I'm getting at—as our understanding of science progresses, we're asking, have our theories got some validity that's much more universal?                                 

Never mind the theoretical physicists' claim about universality that applies to some areas of observation, which now we know to be exceedingly limited. I'm talking about areas of our own direct experience, so we'd expect to have methods that can be applicable in the medical area just as surely as they could in chemistry, or in physics, material science, or engineering. That's the type of question that we're talking about here. Is there some crazy rupture that suggests that things just are too amorphous that they cannot admit theoretical underpinnings?                                 

Certainly, that's not my position on this at all. I would agree that it's beneficial, in areas where we don't understand a whole lot, to do a lot of observational work initially. That will help you unravel some of the correlations that do exist. The big challenge is then to make sense of that in a deeper way, and that's usually forgotten. Unfortunately, we get the conflation of causality with correlation there, which is clearly a false one.                                 

If I were involved, as I am, in trying to support a rather forward-looking version of medicine, which is to say, given some information about a patient—it'll usually be in digital form—it could be their genome, it might be that and a lot of other data, it might include imaging data and so on, I need to assist a clinician who has to make a decision what intervention to carry out. What method am I going to use to do that? The methods that we are interested in using are the good old ones that ultimately are Popperian—a physics-based one. But that pushes the modeling and the theory into areas where it's quite unfamiliar and creates interesting challenges. That's just part of the whole agenda that I'm interested in. How far can you use your theoretical methods across science as a whole? There are plenty of other domains we could discuss there.              

Some of the biggest questions remain open. Things like consciousness have to be continually studied to be apprehended. That doesn't mean if we don't understand that we understand nothing about the way life evolves. It's very far from that. It's much more of a sigmoidal curve of understanding, growing, and being able to benefit from that understanding to continually accumulate improved methods of prediction, which, in the medical area, will transform the whole domain.                                 

If you just stick with personalized medicine for a moment the questions are to do with: so what if I know your entire genome sequence? Again, if you were a reductionist, molecular biologist, pulling Dawkins' leg for a moment, you might think that's the blueprint, that's all we need to know and the rest is then a consequence of what that genome sequence is. I don't think anyone seriously believes that to be the case. The huge number of genome studies that people carry out today show that no matter how much people try to use so-called big data analytics— informatics—they cannot get clear correlations that account for disease cases which are based solely on genomic data. This is an extremely rare occurrence.                                 

You've got an entanglement of data coming at those levels, with higher levels of information. It could be organ system levels. If you had a problem with your heart and you have to go to see a doctor and this doctor has to perform surgery, are they going to look up your genome sequence before they carry out that surgery? It's, of course, not going to happen. In the long run, we will benefit from that information, but as it were, in zeroth-order, they will use information which is at a higher level. You can build physically-based models of the heart at those levels which can be very helpful for predicting what kind of intervention should be carried out, but you will not need, at first level, any information on the genomic component.

It's going to depend on what the problem is, which level you select to take as the primary one to base your analysis on. This is the same through all of science. When we talk about physics, there’s a sense that chemistry somehow sits off it, maybe biology and engineering too. The same applies when we talk about the organic or the medical. The same analogies pertain there. If I'm trying to design new materials, once again their chemistry uses quite different levels of description from their properties.                                 

In the modern era—there have been some interesting developments in the last week which relate to this—we all know it's not sustainable to keep running automotives or aerospace entities based on steel. I would expect within fifty years or probably a lot less, people will have made different types of materials that are as strong, probably tougher, have greater durability, and are far less heavy so they don't require anything like the amount of energy to move around. That's part of the drive. We're interested in that. How do you produce those materials? How do you go about creating them? This is another analogous challenge. I need to know chemistry; I need the physics and the chemistry of what the ingredients are to mix; I need to be able to predict engineering properties out of them, mechanical properties—strength, toughness, durability—which are usually reserved to some other domain to discuss with their own concepts.                                 

This isn't workable now, so we have to have a framework in which we can put these things together. The question is, does such a framework exist, and can I do it? Clearly, my view is that that's possible, but it brings different approaches and concepts together, sometimes different philosophies and mindsets, that need to be properly aligned.                

The opportunities that come the way of chemistry to promote itself are usually spurned and squandered by the establishment in the field. I've already mentioned one in passing a few minutes ago, and that's to do with the origin of life. The origin of life on Earth is fundamentally a chemical question. How did the first self-replicating molecules emerge, if they did, from some Darwinian soup? That's a chemical question, therefore, it's the equivalent of consciousness or the origin of evolution— cosmology, and things like this. The thinking person wants to know about it. And yet the community has spurned it, on the grounds that it's somehow not a respectable discipline.              

The chemists have never pursued it properly. The pressure is growing to do it. It's like cosmology; quite a long time ago, we had people speculating and there's now a lot of data out there. Your theories can be tested. The same thing is true for origins of life scenarios. These exoplanets are going to reveal life before too long—small things, I imagine. We need to explain where these things are coming from. That is a purely chemical question, in my opinion.              

Even chemical companies have undergone a change in the last twenty or more years, whereas they were very influential probably after the Second World War in encouraging the development of methods that had a strong academic base. In the modern era, exigencies are such that they leave the research to the academics; they're much more interested in short-term benefits. They're far less influential in supporting that kind of chemistry. It has had an impact on the way chemistry is run, you might think beneficially, because there is less direct influence from industry. It's probably suffering partly as a result of a lack of that interaction because when you have real-world problems to solve, it may sound dirty and messy, but they can often lead to very interesting new ideas. There's a comparative lack of that today.

My own research agenda may seem curious to people who just look at what I'm doing randomly, as if it's kaleidoscopic, but it's not. It's always been systematic, exactly along the lines we're talking about. The fundamental thing I'm interested in is how I connect an understanding of things on the very small scales with the larger levels. Microscopic to macroscopic. That might be seen as enshrined in the relationship between atoms and molecules and thermodynamics, like a classical description from the 19th century, 20th century, Boltzmann, et cetera. But that program is still there, and that's the hope that makes you believe that it is possible. As long as you can connect microscopic descriptions to larger scale, you have the hope of being able to predict all these things, whether it's inanimate or animate matter, in terms that relate to these shorter length scales where it's necessary.                                 

Going back to the industry end of this, if someone wants to make a product, no matter how mundane it is—shampoo or something like that—the next formulation that's going to make them a lot of money has to be chemically specific. They need to know the chemicals the molecules that are made up, that are in there. They want to know what the properties are of the material when you squeeze it out of the tube. At this moment in time, that relationship is completely ill-defined because it's not systematically laid down. That's the sort of thing, from an applied point of view, I'm interested in. How can I dial up chemical structures—the molecules that go into something—and then tell you without in principle doing the experiment, what the properties would be? That's the sort of firm motivation.

It's a good question: how do you relate the two together? They appear to be completely at odds with one another. Depending on whom you're dealing with, they are, because the education and training of the people concerned is so different. But if you have enough understanding of what's going on between the two, you can draw off both beneficially. That was something that's hinted at in some of the descriptions we're giving.                                 

If I have a complicated process in a cell—it might be a whole sequence of chemical reactions in metabolism, or the means with which a virus infects me or someone else—I need to know all the steps that are going to occur in that process. If it was for medicinal purposes, I need that information because I might want to target one of the proteins involved. I need to have the detail, ultimately some microscopic information. But I've got so much data in the models that I will not have all the information for. I need to get experimental insight.                                 

People have to go around measuring things. There's no escape from that for most of that type of work. There's a deep relationship between the two. No one's going to come up with a model that works without going and comparing with experiment. But it is the intelligent use of experimental measurements that we're after there because that goes to this concept of Bayesian methods. I will perform the right number of experiments to make measurements of, say, the time series evolution of a given set of proteins. From those data, when things are varying in time, I can map that on to my deterministic Popperian model and infer what's the most likely value of all the parameters that would be Popperian ones that would fit into the model. It's an intelligent interaction between them that's necessary in many complicated situations.              

For example, in the work of Judea Pearl (at UCLA) and others—we do share a similar take on this. The problems we face in the bio and biomedical world today are a serious potential clash between two approaches which should be aligned for the beneficial reasons I just outlined. You use data with Popperian methods. In a Bayesian fashion, you can extract deterministic descriptions in a desirable way.                                 

But the real fundamental limitation there, I have to say, is education and training of people in these disciplines. One of the things we may be moving on to, and it should appear in the descriptions that are given in the future, is that if I want to be able to educate and train a doctor to carry out the sort of interventions that I mentioned earlier, they have to understand a lot more about the theoretical basis of their subject. Otherwise, they are not going to be able to champion these approaches, and we will be spending years with fingers crossed hoping that they do adopt them. There is a big, big challenge there.              

There's a lot of work that goes on in population studies, which by definition is almost Baconian, so we get lots of correlations about the way individuals' heart failures may be influenced by various environmental considerations, et cetera. But that has to be seen as only the first step in drilling down to the individual. Clearly, we don't have that understanding yet.              

The principles are similar in what we call today, methodologies that involve multi-scale or multi-physics capability. That does mean precisely what I was saying earlier, that I have different descriptions of, say, matter at different levels, tried and tested, believed in by different communities, say, engineers, physicists, chemists. If we believe these are correct, we have to get them together and make them work. You would like to believe, as a physicist, foundationally— that's my inclination there—that I know how to derive higher-level descriptions of matter from lower-level ones. In theoretical terms, in principle, yes. In practice, this is the challenge.                                 

How do I start from that set of molecules? Their description is quantum mechanical. In some of the work we do, we will be doing quantum mechanics, calculating electron densities of molecules and complicated entities, materials as well. We know that we’ll never—this is the relationship with Dirac—be able to get the largest scales that matter to us by doing calculations on those length and time scales. We have to find ways of extracting the key information that comes out of those calculations, and passing it to higher levels, where we get more length and timescale return from our investigations.                        

Typically today, we are actively involved in what I would call three-level couplings. We have to do quantum mechanics, we do Newtonian mechanics, and we may be doing something that's getting towards a continuum level, or something between Newtonian mechanics for atoms and the continuum levels. We call it a coarse-grained representation. It's more arbitrary; we cluster more atoms together.           

All the time, we do this because of the computational complexity. I can't do a large simulation with a lot of electrons in it; it's far too expensive. If I can reduce the level of complexity the number of degrees of freedom down, I can do larger-scale simulations. Those simulations should be as accurate and as faithful to the molecular information as possible. It's a challenge, A, to figure out what the key information is that passes between the levels, and then, B, to be able to do those calculations to the scale that gives you high-fidelity predictions. It's connecting multiple levels. There are experts in different departments in academic circles. Maybe I'm an expert in quantum mechanics, or the electronic structure of matter. There’s someone who's an expert in some high-level representation, and we might need to deal with an engineer who knows how to deal with finite element analysis. But I don't want those things to stand alone. I need them to be integrated.

It's quite easy to give real-world examples. In fact, the latest thing we did got picked up by Toyota, which is a famous car manufacturer. They put out a patent back in the late 80s, which was going to the thing I mentioned earlier, the desirability of creating what are called nanocomposite materials, no metal in them, which would be as strong and durable as steel and other things. Their first patent on that was found by mixing a material as banal as clay with nylon and they found some extremely interesting properties. In fact, within a few years, they were making some. They still make some car parts out of that, but not the entire frame.                           

The idea is, tell me, as the experimentalist, what ingredients I should mix in order to get the important properties I want: low density, strength, toughness, that the thing won't undergo fractures, et cetera. Fracture is a classic example of multi-scale challenge. It involves, at the smaller scale, a chemical bond-breaking. That's an electron rearrangement, so there's going to be quantum mechanics in this. It's manifestation on a larger scale could be the whole wing of an aircraft is fractured as a result of that bond, so we need to know how those things are connected, and we have to find ways in these scenarios to stop it. I need to be able to design a material—it could be a so-called self-healing material—that just doesn't allow a fracture to propagate. At the larger scale of everyday life, it's clear what we're trying to do here, and it just requires all of these bits and pieces to be brought together.

At this point, the biggest motivation for me has been the purely intellectual one. How do you do this kind of thing? When you have some ideas that are good, you find you can apply them in particular instances. I'm now telling you about where we can apply this, and I haven't been talking to them for very long. But the same approach is necessary when I'm dealing with these medical issues because I know I've got to get the molecular end in, but it could have a manifestation that might be in a heart arrhythmia or something.

Here's an example of the sort of thing we're after: In the genomic era, we're going to know, if we don't already, individual genome sequences. This is happening already for cases like HIV because we know the sequence of the virus that's doing the infecting, which is much shorter than ours, so a virologist will get that sequence on a patient. That's useful information because if you know what the sequence is it might give you an inclination as to which drug to give the patient. Well, how would you do that? Existing approaches are Baconian. Data is collected on endless patients and stored in databases and some form of expert system is run on them. When a new sequence comes in from a virologist, it's matched up to everything that was done before and someone will infer that the best treatment now is the same thing as what was done for some group of people before. This is not a reliable approach to individual medical treatment.                                 

If you can find a Popperian method, you'd be much better off. What is that method? That's one of the things I'm interested in, and that is doing sequence-specific studies from the virus—how it binds to individual drugs. It's no longer a generic task that a drug company is interested in. The drug companies have their problems now. They're trying to produce drugs as blockbusters, one-size-fits-all. This is not going to work in the future anyway. We have to tailor drugs to individuals. The challenge there is, can I match drugs to individual sequences? That's quite a demanding thing. It has quantum mechanics in it, it has classical mechanics, and it connects up to the way the patient is treated clinically. It too is a multilevel thing.                             

This is a great example, not only of having to do that on a patient-specific basis as an academic exercise. I need the answers on a timescale that's relevant to a clinical decision, otherwise, it's academic in the worst sense of the word. I'd publish a paper which looks good, but the patient died. That's another part of what I'm interested in doing—getting these answers on very fast timescales. It turns medical science into one of the biggest challenges in computational science that exists. First, I have to have secure patient data, which has all those privacy issues around it. Then I've got to launch pretty powerful computations in a hurry and get the answers back to some clinician. This is all, what I'm telling you, state-of-the-art, but imagine a medic's training today. They haven't got a clue what I'm doing.

The MRC—Medical Research Council—distinguished though it is, has funded many people who've gotten Nobel Prizes in physiology and medicine and doesn't fund anything which has anything to do with high performance computers. It doesn't understand their role, and that's because the peer review group are people who are trained in the antediluvial approach to the subjects.              

Where do I get my funding from? It's like other that things I do. Origins of life, I contribute to, but you can't get large funding opportunities there. I do it with the resources I get from other places in what I call the interstices—it's not obvious to people including myself where it's coming from next.              

There are limits to what we're talking about here. I'm not trying to go from the blockbuster one-size-fits-all to every single person has their own. That would be a reach too far all at one fell swoop. But there's this idea of stratification, which simply means clustering into groups for whom we know there may be adverse reactions to the drug that's on the market. It is quite shocking how low percentages of the population can respond positively to the drugs that exist. In cancer, it's well under 50 percent. In a sense, scientifically, we owe it to people to understand better what drugs to give them. It's not a question of suddenly having to give everyone a different one, but finding different sets of drugs. It does challenge the model, which typically costs 2 billion dollars and up to seventeen years, to produce one drug. I need lots more of them. Computational methods of the sort I'm talking about are going to have an impact on speeding all that up, no question.              

At the moment, in the UK we're supposed to have the biggest such project in the world, which is this 100,000 Genomes Project. It's a personal initiative of Prime Minister David Cameron. He put 100 million pounds sterling into this, on the order of two or three years ago, and it's just getting going now. It's looking at diseased patients, that number of them, to try and make sense of their genomic sequences. The approach there will be overwhelmingly Baconian, by the way. Somehow, the idea is that we will get enough information that it will help us with drug discovery. But you see, drug discovery needs this more Popperian approach. I have to have a specific drug that I design. I can't just do random, stochastic methods of throwing trial entities at people, and hoping it's going to work.

It's such an intensive approach today. It's expensive and labor intensive. One of the points I was making earlier is that, in a pharmaceutical company, unlike many companies that make chemicals, actually most companies when you're talking about shampoos and other things—it might be a company like Procter & Gamble or Unilever—don't actually have chemists to make these compounds. They reach for the shelf and look for suppliers who provide a set of chemicals that they will mix because those suppliers are in the business of doing the chemistry. But they know what it takes to make their compounds, and those are the ones the other companies will choose from.                                 

If you're doing pharmaceutical design, it's very different. You have to make your own compounds, and that's time-consuming. That goes to this problem that chemistry is often un-reproducible. People in a drug company will try to make a drug that looks like something that someone synthesized before, and they have to carry out certain steps. Unfortunately, there are usually several. Even if the compound looks similar to what was reported in the literature, some steps just don't work. I know the big pharma companies now try to collect data on failed chemical reactions as well to save themselves all the time of trying to make things that we know didn't work, but nobody reported it.      

You can imagine what I would be advocating, and that would be for a lot more scientific basis to what's going on in medicine. My humble opinion is it isn't particularly scientific today. It's a lot of experience and rote knowledge, but it's not informed by proper mechanistic understanding. In the end, we need that kind of mechanistic understanding to have a predictive capability in the discipline.

The one or two examples I've given you, and I can give you plenty more, are all pointing towards the fact that we now have enough theoretical knowledge to build models that have predictive capability in the medical area. Today, many clinicians would admit to you that their decision-making is a bit of a finger in the air job. They have to take a decision—I know from discussing with medical colleagues that many would like to use better methods to support those decisions. That doesn't need to imply we're going to do away with doctors at all, but it's just enhancing the value and quality of the decision-making.              

It all plays into the fact that we do not want to do too much animal testing in the future. If you can have a virtual human model, clearly you can do testing on that, and you don't have to do the amount of animal testing we've been doing to date. There'd be more high-fidelity stuff. I can't refrain from also mentioning, because I use this high-fidelity modeling and simulation for medical purposes. A lot of people who are not experienced with these things think, oh, how can we ever trust the outcome of a prediction, especially from a computer, if it's on a human being? Somehow, it's got to be 100 percent correct or we'll never use it. This is certainly not true. What does it mean for a model to be 100 percent correct?

High fidelity, as a term I use, is enough to be able to assist with clinical decision-making. There will have to be regulations that define what those things are. They have to be reproducible, that it doesn't depend only on me doing it, but the next person will get the same result when they carry that procedure out. This is all stuff you can standardize. But I know it goes also to this military-industrial complex thing in the US, where that term I borrowed comes from nuclear weapons stockpile stewardship. That's the area in the US where the government is throwing huge amounts of money at computing. For example, with a test ban treaty in place, we have to do simulations of these things. And they set the milestones for the computer power in order to reach a level of fidelity that's deemed to be acceptable for some type of simulation of the test. This is just an ongoing thing in the US. If you can do it for nuclear weapons, then the scale of the computers that are there will match the things we need to do in these other areas, no question.                                 

There's a question mark about who's going to pay for it, and in the US you might have to pay for your simulation if you want an enhanced result. I've been in discussions like that. For example, some of my work is funded by the EU in their eHealth sector, and there—the EU—the guys in the Commission in Brussels assume that everyone would get access to these techniques, it would be free at the point of delivery. But a US colleague would expect you'd pay a premium to get a computer simulation done to enhance your clinician’s decision. This is all part of the way that the future will evolve.

In the long run, to do something that's personal, it's going to be Popperian. It will have vestiges of Baconianism around it. I want to take and use data that would be about me. For example, it could be imaging data, genomic data, and other things. I just want that. I don't want to have a prediction of what is going to happen to me based on statistics from other people. That's still better than what people are doing today, because it can give good indications. Ultimately, we want these avatars that are personalized, as accurately as they can, to ourselves.  

It ceases, in the end, to be only something for people who are ill. It's relevant to people in a state of wellness. My friend Leroy Hood is talking all the time about wellness things. He wants to do a 100,000 Genomes Project on wellness, because he's not trying to do the disease case that the UK is about. Just to help people understand their predicament, and to take decisions, lifestyle choices, based on that information.