What is molecular population genetics?
Abstract and Keywords
Chapter 1, “Introduction: What is molecular population genetics?,” presents the motivations, applications, and historical context for molecular population genetics as a subdiscipline within biology. It describes how changes to DNA are inextricably woven into thinking about evolution and how molecular population genetics can be used to transport our thinking backward and forward through time. Key classic theoretical ideas summarizing allele frequency change, probability of fixation, and the time to fixation are encapsulated in brief vignettes. Both fundamental and applied uses of molecular population genetic perspectives are summarized in this survey of the historical, conceptual, and empirical development of the branch of science that we call population genetics and its integration with DNA sequences.
Do you want to travel through time? Well, you’re in luck. Because molecular population genetics is a time machine and, for many purposes, it may be the best we’ll ever have. It transports us backward through time, forward through time, takes snapshots of points in time. When we look out of the windows of this time machine, we don’t see wormholes and ethereal gases spinning in spirals—instead we see the evolution of the spiral helix of DNA. We see mutations arise and spread like ripples through populations, producing adaptations for all of life, the baleen of the whale, the eye of the squid, the horn of the rhinoceros, the light of the firefly (Figure 1.1). We see our ancestors winking at us at the far ends of gene trees. We see what we share with chimpanzees, with kangaroos, with clams and algae and Salmonella. We see some populations explode with growth to expand across continents and others shrink to petite refuges and still others spring like rubber bands, their numbers bigger and smaller and bigger and smaller, over the passage of time. We see animals shuttle back and forth as migrants between different groups, and we see them stop their shuttling, changing those groups forever. We see it all in our genes, in the simple differences of those four nucleotide letters of DNA that share parts of their history with everyone else. The time machine of molecular population genetics lets us get inside evolution to see what is going on and to really understand how it unfolds.
1.1 On the origins of molecular population genetics
Everyone now takes it for granted that changes to DNA are inextricably woven into thinking about evolution (Box 1.1). But, of course, this was not always the case. Evolution, as Charles Darwin wrote about it publicly from 1859 until his death in 1882, describes the process of heritable trait change across generations. The mechanism of the “heritable” piece of this process, however, was unfortunately and famously unknown to Darwin. This missing link stymied a quantitative view of evolutionary change for a while. The logic of genetics only became appreciated in 1900, after the rediscovery of Gregor Mendel’s cross-breeding experiments that he had performed four decades earlier. Mendel demonstrated that genes are “particulate,” inherited intact from parent to offspring, and that their transmission from parent to offspring follows certain mathematical rules. With this simple insight, progress was set to accelerate.
Now, with more than a century of genetic research in hand, we can define evolution in the most basic of ways: evolution is the change across generations in the relative abundance of different forms of genes in a population (Box 1.2). The clarity of genetic (p.2) thinking spurred mathematical summaries of evolutionary change that reinforced and elaborated on Darwin’s foundation for how to conceive of the living world (Boxes 1.3 and 1.4). The mathematical theory and empirical analysis of changes to gene frequencies over time is the branch of science that we call population genetics.
The basic ingredients to population genetic theory about evolution were firmly established in the first half of the last century by Ronald Fisher, Jack Haldane, and Sewall Wright, among other luminaries of biology (see Timeline of molecular population genetic history). The five major forces of evolution—mutation, genetic drift, migration (gene flow), recombination, natural selection—secured their place in evolutionary thinking nearly 100 years ago. All this discovery took place without any knowledge of what is the principal heritable material that transmits from parent to offspring: DNA. This classic and exceptionally general population genetics theory forms the backbone of modern evolutionary understanding (Box 1.2), including how we think about the evolution of DNA sequences. Michael Lynch (2007) emphasized the importance of understanding population genetics in saying “nothing in evolution makes sense except in the light of population genetics” in his saucy rephrasing of Theodosius Dobzhansky’s famous quote about the light that evolution brings to biology.
There are two key ways of thinking when thinking about population genetics, and both are foreign to most of us in our everyday individual experience: population thinking and tree thinking. We are used to how cause and effect interact for an individual, like how when you take a bite of an artisanal chocolate, you sense a delicious taste in your (p.3) mouth. Population effects are a bit different. The consequences for the population result from the collective effects on individuals: you tell your friends about the new chocolatier in town, and pretty soon the general vibe of the neighborhood is much happier from all the endorphin stimulation from the increased chocolate consumption. In population genetics, we are concerned with the genetic outcomes for a population as a consequence of the survival and reproduction of its individual members. How many individuals have this gene copy or that one, and how well overall does a given copy propagate to the next generation? Another component of population thinking is that we generally do not have information about every individual; we just have data for a sample of them that we must use as a representative group, presuming that they are a random subset of individuals.
But individuals in populations are not static or fully independent of one another, their composition changes over time and their gene copies are related to one another: these features lead us to tree thinking. The idea is that we can use graphical branching diagrams to give a concrete representation of relationships between gene copies that are present in different individuals, whether those individuals are members of a single species or even from different species (Figure 1.2). DNA sequences give us a natural basis for quantifying homologous features found in different individuals (see section 5.3), features that share a common ancestor but that have changed due to some or all of the factors that influence how common is a given gene copy in a population. A virtue of both population thinking and tree thinking is that we can use mathematics and statistics to integrate ideas and data to describe evolution in genetic terms.
(p.4) The second half of the last century saw the dawn of molecular biology, after Alfred Hershey, Martha Chase, Francis Crick, and James Watson pinned down the chemical structure of DNA as the heritable material in the early 1950s (Box 1.1). The Central Dogma of Molecular Biology soon followed, delineating how transcription of RNA from DNA is followed by translation of protein from RNA.
(p.5) From your day-to-day life, you are already familiar with many of the slight heritable variations between individuals in what traits they have: height, hair color, earlobe attachment. But even with this phenotypic variation, people still look like people, mostly having traits that they share and that make humans quite distinct from other species. What about DNA? How polymorphic would molecules be within a species? Richard Lewontin and Jack Hubby demonstrated in 1966 that protein differences among individuals were extremely common, and Martin Kreitman in 1983 showed that the DNA sequences for the gene alcohol dehydrogenase from a collection of Drosophila melanogaster fruit flies differed between every single copy that he looked at. Human molecular variation told the same story, as Harry Harris first saw for protein differences and Charles Aquadro and Barry Greenberg found for DNA. Everyone already knew that genetic variability existed, (p.6) but molecules turned out to be rife with variation, with variability so much more pervasive than anyone could have guessed until they looked. Such molecular genetic variation in DNA represents the most fundamental kind of genetic variation.
Biologists realized that DNA sequence variation holds important clues to the past, about the history of populations and the evolutionary forces that shaped them, if only there were a way to wrangle it. As Jack King and Thomas Jukes wrote in 1969, “Patterns of evolutionary change that have been observed at the phenotypic level do not necessarily apply at the genotypic and molecular levels. We need new rules in order to understand the patterns and dynamics of molecular evolution.” What to do?
This connection between the concept of genes and alleles with their physical and chemical basis spawned a new series of mathematical models of evolution. These models built on classic population genetics theory (Boxes 1.2, 1.3, and 1.4), but extended it to incorporate realistic details from molecular biology. Importantly, it is that same set of five key evolutionary forces from the pre-molecular age of evolutionary thinking that also influences changes at the molecular level. The most important among these theoretical developments arrived in 1968: the Neutral Theory of Molecular Evolution, introduced by Motoo Kimura.
(p.7) What was so inspired in Kimura’s theory that set it as a key milestone in evolutionary biology? His research introduced elegant and deceptively simple predictions about how changes in DNA ought to work. The Neutral Theory provides a null model that we can compare to observed patterns of genetic variation from the real world. This comparison gives us a “test of neutrality” to take those evolutionary forces that depend on chance events, the neutral forces, and see how well they can do on their own in explaining changes to DNA. The “standard neutral model” is the simplest null model based on the Neutral Theory, and it uses many of the same assumptions as other classic population genetic models (random mating, stable population size, mutation-drift equilibrium). Like all models, the standard neutral model is thus a simplification of the natural world. Scientists must look at models with their eyes wide open, taking the same outlook that statistician George Box so aptly invoked: “All models are wrong, but some are useful.” The oversimplification of models, including the Neutral Theory, is intentional and we can use it to help us understand the additional complexities of nature.
We can, of course, make this molecule-inspired-but-simple model of evolution even more realistic. We can incorporate added layers of biological complexity. One of the aims of this book is to point out how well and how poorly the oversimplified models perform, when it matters, and how to modify them appropriately. The most obvious oversimplified piece of Neutral Theory is its lack of integration with the process of adaptive (p.8) evolution by natural selection. Many biologists focus on natural selection as the main non-neutral force of interest, as our innate inquisitiveness about all the life around us often leads us to ask about how organisms adapt to their world (Figure 1.1). Ironically, we can use neutrality to learn about selective non-neutrality. But another common goal that Neutral Theory helps us out with is in detecting demographic changes in a population’s history using genetic data, as for understanding our own human past through “molecular anthropology.”
How much evolution at the molecular level can be explained just by chance evolutionary forces, by mutation and genetic drift? Tests of neutrality once were limited to small individual cases, to evaluate how single genes are affected by natural selection. But modern molecular population genetics has scaled up to let us scan across entire genomes. Think for a moment about what it means to do such “molecular population genomics.” If you were to analyze the DNA for just 50 humans, you would be dealing with 300 billion nucleotides and over 10 million nucleotide differences. And nowadays, many thousands of human genomes are sequenced on a regular basis. How to make sense of all that? The ideas and techniques of molecular population genetics show us the way.
1.2 What is the use of molecular population genetics?
The remainder of this book aims to introduce, in an accessible way, the bare essentials of the theory and practice of molecular population genetics. You can use this book to develop an understanding of the origins and implications of molecular diversity within populations, molecular divergence between populations and species, and how diversity and divergence connect with the five major forces of evolution. Sometimes referred to as microevolution, we will focus on the evolution that happens within species and that contributes to divergence between species that are very closely related. As opposed to studying deep-time macroevolution of distantly related organisms (Figure 1.2), we will primarily have in mind a relatively short timescale for understanding evolutionary change, usually spanning from a few generations to a few million generations. This timeframe of DNA sequence evolution is the purview of molecular population genetics. We will filter these ideas through “genome thinking” to learn about the powerful and general forces that control the evolutionary process.
Molecular population genetics has its own abstract parts—for example, coalescent theory (see section 5.2), statistical models of sequence evolution (see section 5.3), the implications of selection interacting with genetic linkage (see section 7.1)—but, by and large, molecular population genetics is embedded in empirical patterns seen in real-world observations (see Chapters 2 and 3). We will take advantage of these concrete features of DNA to make sure we also grasp the more ethereal concepts. That is, how can we interpret patterns of DNA sequence differences within populations, between populations, and between species? Why does a given species have as much genetic variation as it does? What do those molecular signals tell us about natural selection and the demographic history of organisms? What are the roles of mutational input and genetic drift and recombination in genome evolution?
This list of questions might sound esoteric, but molecular population genetics also helps us deal with important and interesting problems in applied biology. What is the likelihood (p.9) that this blood sample from a crime scene belongs to the accused individual? How much of human genetic material derives from extinct human species like Neanderthals? What genetic changes are associated with inherited diseases? How does HIV drug resistance evolve within an infected person? Where has this invasive crop pest species colonized from? Are populations of this rare organism actually genetically distinct enough to warrant protection under endangered species legislation?
Thus, molecular population genetics provides the tools and framework to address a wide spectrum of inquiry, with the emphasis of analysis depending on the goals or interests of the scientist. The topics span from purely academic problems that often extend over prehistoric timescales all the way to applied problems that operate on the most contemporary of timescales. At one end of the spectrum, we can think of the selectionist’s research program. From this view, one aims to identify the targets of natural selection in genomes, to infer the prevalence of selection across the genome, and to estimate the magnitude of selective differences at the molecular targets of selection. For these purposes, other evolutionary pressures, including demographic perturbations, represent “nuisance” processes that must be accounted for simply as part of the baseline null model to extract signal from noise.
At the other end of the spectrum, for the molecular anthropologist or phylogeographer or landscape geneticist, it is exactly these details of demographic history that formulate the key questions of interest. Think of applied problems that operate in the present day, such as conservation genetics of human-impacted populations and for human populations themselves. In fact, selection in the genome may be the “noise” if we want to know how the geography and distribution and abundance of individuals over time and space have left their marks in genomes for us to read. And still many other perspectives and goals abound for wanting to distill insights from molecular population genetic data, from forensic analysis, to epidemiological inference, to mapping of disease alleles, genetic dissection of animal and plant domestication, conservation biology, and molecular ecology.
In practice, these disciplinary subdivisions are not hard boundaries. Most practitioners of molecular population genetics are interested in each of these facets to differing degrees. The priority of your emphasis depends on the interests of you, the investigator, and the questions piqued by the particular biology of your study system. But you don’t want tunnel vision to blind you to alternative explanations. Consequently, it is important to gain a full appreciation for the influences of both selection and demography in molecular population genetics in order to address properly any given problem.
Because our aim here is to focus on molecular evolutionary change, I will leave to the side most things about the phenotypes of organisms and the details of how genotypes map to phenotypes (Figure 1.3). In Chapter 9, however, I will walk you through a series of evolutionary vignettes that do feature exciting phenotypic change and also feature all of the molecular population genetic tools from the preceding chapters. In the interim, when using phenotypic examples and analogies, I will primarily use obvious and direct connections between genotype, phenotype, and fitness. This, of course, is a big simplification. Some caveats to this simplicity will crop up as signposts for more advanced study of molecular population genetics beyond the scope of this book. What I want to do mainly is to distill down to its essence the logic of the population genetic process of evolution at the level of DNA as the physical basis of evolution. That essence will be (p.10) the fuel that drives our molecular population genetic time machine to let us relish the splendor of evolution from the inside out.
Adamkewicz, L. and Castagna, M. (1988). Genetics of shell color and pattern in the bay scallop Argopecten irradians. Journal of Heredity 79, 14–17.
Casillas, S. and Barbadilla, A. (2017). Molecular population genetics. Genetics 205, 1003–35.
Charlesworth, B. and Charlesworth, D. (2017). Population genetics from 1966 to 2016. Heredity 118, 2–9.
Darwin, C. R. (1859). On the Origin of Species by Means of Natural Selection. John Murray: London.
Hartl, D. (2000). A Primer of Population Genetics. Sinauer Associates: Sunderland, MA.
Hubby, J. L. and Lewontin, R. C. (1966). A molecular approach to the study of genic heterozygosity in natural populations. I. The number of alleles at different loci in Drosophila pseudoobscura. Genetics 54, 577–94.
Kimura, M. (1968). Evolutionary rate at molecular level. Nature 217, 624–6.
King, J. L. and Jukes, T. H. (1969). Non-Darwinian evolution. Science 164, 788–98.
Lynch, M. (2007). The frailty of adaptive hypotheses for the origins of organismal complexity. Proceedings of the National Academy of Sciences USA 104 Suppl 1, 8597–604.
Padel, R. (2010). “Giant Bugs from the Pampas.” In: Darwin: A Life in Poems. Vintage Classics: London.
Watson, J. D. and Crick, F. H. C. (1953). Molecular structure of nucleic acids—a structure for deoxyribose nucleic acid. Nature 171, 737–8.