video
Episode
What is Biological Information?
Contributors
- Robert Lawrence Kuhn
- Information is Hot. Especially in physics and artificial intelligence: John Wheeler’s “It from Bit” speculates that information may be the foundation of reality! And we’ve pursued information in Closer To Truth’s Cosmos category. What about information in Closer To Truth’s New Life category? The messy, wet world of biology? I’m now thinking, what does information have to do with life? I start with basics. Information is central to science. For analyzing data, experimental and observational, and for conceptualizing ideas, discerning mechanisms, testing hypotheses. But does information in biology go further? Function beyond analysis? Meaning beyond metaphor? That’s why I go to philosophers of biology. I’m cautious… I fear getting swept up in the information metaphor of the moment. Still, what is the nature, or the kinds, of information in biology? What is biological information? I’m Robert Lawrence Kuhn, and Closer To Truth is my journey to find out. In one sense, information is a common term, informing messages and conveying meaning. In another sense, Information Theory is a technical expression, to describe the accuracy of transmission across a communications channel. To grasp the use and role of information in biology, I speak with philosophers of biology.
I begin with a philosopher of science and biology who deals with diverse kinds of information. From evolution and natural selection to animal sentience and consciousness: Peter Godfrey-Smith. Peter, you’ve worked in biological information. Is there some real substance to it? - Peter Godfrey-Smith
- There’s certainly substance to it. I have a – what I think of as a somewhat restricted, or a hold-your-horses, attitude to the love of informational concepts within biology. I think that the way the genes work involves a kind of informational specification of some of their products, the protein molecules that they make. Then the proteins go off and do all their different things. The idea of genes as a kind of memory, and a kind of control device with an analogy to computation, those are real discoveries. From there, some people think, right, you know, evolution itself is a giant, information using, or information involving process, or life itself is fundamentally informational, and that’s my hold-your-horses stage. I think – No, no, no, – I think we’ve learned some things about the nature of gene action that give a surprising and real role to concepts of information. The idea that the genes code for some of their products – it’s quasi-metaphorical but a very firm metaphor if it is metaphorical. Whereas the idea that evolution is an information-processing activity, people might find it helpful to think that way, but I don’t think it’s really true. In the case of genes, the idea of a code, I think, is the concept that really bears some weight here. Where you have, you know, an order of elements, a kind of finite alphabet-like set of elements in the genes, and a regular, reliable process by which they specify the order of the amino acids within a protein molecule. You know, the term genetic coding is properly applied to that step in a causal chain. And that’s a real discovery, that that’s how it works. Now, it’s tempting them to say, and all the downstream stuff, all the stuff that goes on in the rest of the organism when those protein molecules go and do what they do, that’s informational as well. And I would say, no, no, we really learned something surprising about the way that genes work through this alphabetic, rule-governed – a kind of templating operation by which one kind of molecule in the genes gives rise to the exact ordering, more or less, in a quite different kind of molecule in the proteins. That’s the home, I think, of information talk and code talk within that part of biology. And you know, and as I say, I resist the attempts to sort of then make it into everything.
- Robert Lawrence Kuhn
- Yeah, so basically, we need to look at each biological process and determine is information something fundamental or not?
- Peter Godfrey-Smith
- Right. Is there a reading-like process, or not? Is there a kind of genuine alphabet-like structure, or not? Those are the questions to ask.
- Robert Lawrence Kuhn
- I hear two kinds of information in biology: Strong and literal, exemplified by genes and the genetic code. Speculative and metaphorical, as with evolution as a whole, where information describes entire populations or classifies species. Fair enough. But, even in the genetic code, where information is clearest, what is the nature of the information? Is it like computer code? How deterministic? How error prone? I ask a British-Australian philosopher whose engages biological information in development, heredity, and evolution: Paul Griffiths. Paul, the subject of information is, shall we say, trendy in the sciences. And that’s why, when I began to see biological information theory, I approached it a little skeptically.
- Paul Griffiths
- I think it’s good to be suspicious. I mean, you know, in the 17th century, organisms are clockwork. In the 19th Century, organisms are steam engines, and so, of course, in the 20th Century, organisms are going to be computers, right? I guess the picture that most people who read popular science would have is something like this. Over evolutionary time, evolution writes information into your genes, just like writing computer code. And when an egg is fertilized, that information is read out of the genes, and it creates the organism, okay? And for a lot of my career, I’ve basically been writing critiques saying that there is no real biology which literally corresponds to that vision. The core, reliable, well-confirmed understandings we have of evolutionary development do not look like a picture in which a program for building an organism is written into the DNA like computer code.
- Robert Lawrence Kuhn
- So, what’s a piece of evidence to support that?
- Paul Griffiths
- Well, mainly the role of the environments around the DNA and development. Everybody knows about the genetic code. And the genetic code is a language in which you can specify linear order of amino acids in a protein. Now, I can’t tell you to grow a leg in the genetic code for the same reason that I can’t write a great novel using the Navy flag signals. If all I can say is weigh anchor now, and diver beneath, right? What I can do is to tell the cell to put amino acids in a particular order. Now, in fact, even that process of putting amino acids in a particular order is only one of the ways in which genes do the most basic thing they do, which is to make functional RNAs and proteins, because those RNAs and proteins are affected by other processes, which can cut and paste the sections together, which can flip them around, and can even edit them one little molecule – molecular unit, at a time. So, the process of even getting a structural protein out of your DNA is this highly interactive regulated cell-level process.
- Robert Lawrence Kuhn
- Sure.
- Paul Griffiths
- So, really, there’s two different things going on that are real science. One is the genetic code, and the idea that a program for building an organism is written in the genetic code is simply not true. The other sense is in which you can think about the ways in which different parts of the DNA affect each other, in the way that you can think about a cybernetic diagram which shows negative and positive feedback loops, and generally computes some things. And the reason that I – I think that doesn’t vindicate the idea that the genetic program is written in your DNA, is that if you ask yourself why does the DNA do something at a particular place, in a particular time, you have to put lots of other control engineering, more loops, more feedbacks, which are involving lots of other molecules that are in the cell. In my own work, I’ve argued that we can simply understand causation via information theory. And if we can understand causation via information theory, then, of course, we can understand the flow of causation through biological networks. But the closer you look, the less the use of all of these informational ideas, and programming ideas in real science – looks like this story that evolution writes a program for building an organism into the DNA, and then some simple operating system just reads it out and gives you an organism.
- Robert Lawrence Kuhn
- And that’s not right.
- Paul Griffiths
- I think that’s about as true as the 17th-century view that an organism is like a watch. An organism is a bit like a watch, but it ain’t much like a watch, okay?
- Robert Lawrence Kuhn
- The genetic code is indeed information. But it’s not a simple “read out” as if from a computer memory. I like that. I’d not like biological information to be fixed and final. Life is not a computer hard drive. But biological information is not just encoded in our genes. It is also how our genes are expressed in populations. This means mathematical descriptions that relate frequencies of genes in DNA strands to frequencies of traits in large groups. This includes, I hear, three famous equations or formulas – Price Equation, Hamilton’s Rule, Fisher’s theorem. I go to England to meet a philosopher of evolutionary biology who assesses the math and meaning of these classical evolutionary principles: Samir Okasha.
- Samir Okasha
- Let’s start with Price’s equation, and named after George Price, who was a maverick evolutionist working in London in the ’70s. And what Price did, in effect, is to just simply produce a very simple formalism, just a description or a decomposition of the total evolutionary change in a population from one time to another, into two components. So, by the evolutionary change, Price was thinking primarily of the change in gene frequency between one generation and another. But in fact, one can think about the frequency of any trait at all, if you like, say the change in the average value of a quantitative trait, such as height, in a population. And what Price did was express that change as the sum of two components. One of those components reflected, in effect, natural selection, and captured the extent to which differences in the trait, the gene, correlate with differences in reproductive fitness – in how many offspring you leave, in short. And if that correlation is positive, then that means that natural selection is favoring the trait. And if it’s negative, then it means it’s disfavoring it. The second term of Price’s equation describes the transmission of the gene or trait from one generation to another, so captures predominantly effects such as mutation. So, if the systematic mutation in favor of a gene, from its alleles, then the gene’s frequency will increase over time. And what Price did is show that the overall change in frequency of a gene must always equal the sum of these two components.
- Robert Lawrence Kuhn
- Hamilton’s rule?
- Samir Okasha
- Hamilton’s rule was originally designed by William Hamilton, the English biologist in the early 1960s, as part of his attempt to understand how on earth it’s possible for altruistic or self-sacrificial behaviors to evolve. So, ordinarily, evolution leads us to expect that individuals should exhibit behavior that benefits them, not others. And so, in general, if there was a heritable tendency to behave altruistically towards other individuals – that tendency would have to be counter-selected. Now, Hamilton realized that the logic of that argument breaks down if the help is directed not at randomly chosen members of the population but at relatives, basically, because relatives share genes. So, if you imagine there’s a gene that causes altruistic behavior so long as that behavior is directed at genetic relatives rather than non-relatives, in principle, natural selection may lead the behavior to evolve. Because the cost to the individual of being altruistic is offset by benefit to other related organisms. And Hamilton’s rule simply formalizes that insight.
- Robert Lawrence Kuhn
- Right, okay. Fisher’s theorem?
- Samir Okasha
- Fisher’s theorem, the so-called fundamental theorem of natural selection, long a source of controversy in evolution. This came from Ronald Fisher’s famous 1930 book, The Genetical Evolution of Social Behavior. And what Fisher did, in his own words, was to discover what he called the supreme principle of the biological sciences. All Fisher’s theorem says is that the growth rate of a biological population, or rather the change in the growth rate, or the rate of change of that growth rate, with respect to time, will be proportional to the amount of additive genetic variation in the population. Additive roughly here means heritable. And the key point is that the additive genetic variance in a population is a non-negative quantity. So, it can’t be below zero, and will typically be positive. So, it seemed as if what Fisher was saying was that the population’s growth rate always had to increase. Fisher’s theorem doesn’t deal with the entirety of the evolutionary change, but only with the part that he thought was attributable to natural selection. So, natural selection, if you like, has always got to be improving a biological population, even though that might be offset, and usually will be, by other factors.
- Robert Lawrence Kuhn
- If you had to generalize the importance of these three mathematical simple formulas, what would you say in terms of its use today?
- Samir Okasha
- What they do is all – or try to do, is to provide extremely general principles that capture something that’s both intuitively right about how natural selection works and is – contains deep biological insight, if you like… Try to, I say.
- Robert Lawrence Kuhn
- The elegance! Three short, simple equations or formulas describing the rich, central tenets of evolutionary biology. Relating genotypes, the genes inside; to Phenotypes, the traits outside. This kind of information is relational: how two or more variables or factors, relate, co-vary, or move together. Now, what about the other kind of information – Semantic information, bearing meaning. I meet a philosopher who focuses on evolutionary and developmental biology. He leads a research project on agency and directionality in a science of purpose: Alan Love.
- Alan Love
- Information as semantic content is the everyday use that we oftentimes find ourselves appealing to as we say this has information about that. It’s the aboutness that is that semantic key. However, the notion of information as covariation is more connected to information theory developed by Shannon and others in the 20th century. And there, the important thing is reliable covariation of features that allow prediction. So, when in the biological sciences, people talk about information, they sometimes slide back and forth between these two, but they need to be importantly kept separate.
- Robert Lawrence Kuhn
- Which of the categories of biological sciences lend itself to information in the – in the technical covariation sense?
- Alan Love
- Probably the most significant would be systems biology, where people are trying to do large data analyses of, say, a cell and – they’ve measured a bunch of properties, and then they’re looking for ways in which the properties of one are predictive of the properties of others at a later time point. And so, then you can talk about information content because you can actually quantify over those reliable connections in the system with a gigantic data set.
- Robert Lawrence Kuhn
- You may have some correlation, but what are the error bars around it, I think, is a critical factor.
- Alan Love
- You have information, and you have noise, and noise then is going to be something that relates to the error in measurement and also to the natural stochasticity of the system. The system could just be noisy, and we need to recognize that.
- Robert Lawrence Kuhn
- In the neurosciences, it’s the case, and it may be in evolution, that noise is – is not just a bug of the system, but it’s a feature of the system where you couldn’t do it without the noise.
- Alan Love
- And I think that is also a counterintuitive thing because it’s actually not just the information, but it’s the noise that makes a difference. All kinds of key decisions in living systems, such as when does a cell get its fate specified, actually can be connected with natural noisiness and stochasticity in these systems, and we better understand that now than we did even several decades ago.
- Robert Lawrence Kuhn
- Right, on the other side, on the semantic information, are there big chunks of that that you can, kind of, as a philosopher, disaggregate and look at – look at different large sections of semantics?
- Alan Love
- Yeah, I would say an area where the semantic notion is very important for the scientific work is in animal communication. So, trying to understand how do signals that animals give convey information about situations. So, a warning signal, right, to the degree to which one member of a group of rodents makes a call to alert them to an avian predator nearby, does that signal need to communicate only danger? Does it need to communicate aerial danger? Many people would argue that the use of semantic information is problematic in areas of biology and actually leads to inferences that are inappropriate or inflated in certain ways. So, the discourse of information, I would think, is a mixed alloy. There are times when information language is used, and causal language would actually do just fine and be more precise. And there are other contexts where the information language might be necessary, but the slipperiness between those two can lead people astray. So, there’s reason to pay attention.
- Robert Lawrence Kuhn
- Alan distinguishes two sharply different uses of the same term “information”. Covariational, or relational information, which is quantifiable and algorithmic and enables prediction. And semantic information, which is qualitative, has “aboutness,” and conveys common or idiosyncratic meaning. Alan advises caution with semantic information, because simple causal or descriptive language is often imprecise. Given the two kinds of information, what are the practical applications in biology? Why is computational biology becoming so prominent across the biomedical sciences? I explore biomedical information with a physician and professor of genomic medicine and translational bioinformatics: Joshua Swamidass.
- Josh Swamidass
- We’ve been in this weird spot where computational biology really forming a new discipline within biology which ends up being like the thumb touching all the other different fingers in biology. No matter what you want to study or how you want to study it, it seems is producing just large amounts of data that’s really quickly outstripped what many biologists were able to do. A lot of the early techniques and even the modern techniques of getting genetic data really required the development of new algorithms and new hardware even to be able to reassemble genomes in a way that was actually the sequencing data.
- Robert Lawrence Kuhn
- Couldn’t have really matured or begun very effectively without computation.
- Josh Swamidass
- Yeah. That ended up being only the beginning, right? In the ‘90s where computational biology really begins, but once you have these big, long sequences of As, Gs, Cs, and Ts…
- Robert Lawrence Kuhn
- About how long is that? Three billion or?
- Josh Swamidass
- Well, for the human genome, it’s three billion. Some of it is just noise and error, but how do you actually process that amount of data? Very quickly, that starts to outstrip your ability to, kind of, by hand move sequences around and align them. Then even once you have that aligned genome, an aligned genome’s about just two gigabytes of data, what does it mean? Early on, biologists recognized that, okay, we’re going to need people who are specialized in this.
- Robert Lawrence Kuhn
- Right.
- Josh Swamidass
- There’s a little bit of a bias, or there has been, at least historically, towards experiments, meaning like wet lab experiments, in biology. And Genetic data really started to challenge that, saying well, maybe it’s okay if some people actually specialize and all they’re doing is experiments in a computer. In general, the way – how it works in biology is that we can usually, kind of, think about the rules that give you kind of like the first sort of our approximation of how things are, but then we also know that there’s exceptions. Those exceptions also have a certain pattern to them. Now, people also are looking at how to understand epigenetics in DNA. And there’s also imaging data too that comes up. I’m a physician as well as a computational biologist. For a long time, the hope has been that, you know, healthcare data would move to digital, which it has, and that we’d have ways to look at that to ask scientific questions. This is interesting, right, because what we really want to do, honestly, is be able to understand, in ethical ways to be clear, how human biology works using humans as the model organism.
- Robert Lawrence Kuhn
- Information pervades biology. Information increasingly defines biology. Advances in all the biomedical sciences, are now driven by information and its processing. Here is what I hear: What’s best is perhaps a restricted or middle-ground approach to information – information-based research works well in molecular biology & neuroscience. Less well in re-imagining evolution as “a giant information process.” Information is essential in both evolution & developmental biology, but there is no program for building complex organisms written into the DNA like computer code. The power of simple formulas to describe elaborate biological relationships between internal genotypes or genes, and external phenotypes or traits, puts evolutionary biology on a more scientific foundation — and may provide special insights. It’s crucial to distinguish covariation, or algorithmic information, which enables prediction, from semantic information, which conveys meaning or “aboutness.” Computational biology is transforming genetics, protein interactions, drug development, clinical imaging – enabling the human being to be, with safety, the model organism. Biological information – Metaphor? Facilitator? Catalyst? Or… Transformative? Revolutionary? Paradigm Shift? Something of each, I suppose, getting us… Closer To Truth.