Reassembling animal evolution: a four-dimensional puzzle
Reassembling animal evolution: a four-dimensional puzzle
Abstract and Keywords
Drawing from the recent literature and the contributions in this volume this chapter considers some of the recent progress made in the study of animal evolution and the hurdles that remain. Modern approaches to the study of animal evolution integrate palaeontology, evo-devo, phylogenetics, and data from genomes each in the pursuit of a greater understanding of homology as a means of revealing patterns and processes through time and across multiple levels of biological organization. Gaps in our knowledge are inevitable, but with caution, careful sampling and keeping an eye towards new technologies and opportunities, we should not be deterred from inferring the patterns and processes we wish to untangle.
Drawing from the latest literature and the contributions in this volume, we consider some of the recent progress made in the study of animal evolution and the hurdles that remain. Each of the disciplines considered—palaeontology, evo-devo, phylogenetics, and the incorporation of genomic data—have made major contributions to our understanding of how animals have diversified. Together, these pursuits are resulting in a return to whole-organism biology where the link between genotype and phenotype is considered in the context of changing physical and biological environments. The modern approach integrates across all these sometimes disparate disciplines, with the aim of reconciling available evidence to describe the patterns and processes that have led to the existing diversity of animal life.
Arguably, there is one underlying common quest that unites the goals of individual researchers: the search for homology—recognizing it, defining it, and using it. Whether it is establishing shared common ancestry of form or function, similar challenges face those contemplating strings of nucleotides, protein structure, gene expression, biochemical pathways, organs systems, or fossilized microstructures. As we move towards a greater understanding of evolution and the biological entities undergoing selection, it is the study of homology that allows us to detect patterns and interpret processes.
Gaps in our knowledge can be daunting. At best they define the limits of our ignorance, and at worst they prevent any meaningful or confident interpretation of available information. We consider how some of the major gaps are being addressed with the renaissance of whole-organism biology, the development of improved models, and the advent of new technologies.
18.2 Phylogenies and phylogenetics
Since the first credible molecular estimate of animal relationships was published by Field et al. (1988) there have been a number of significant changes in our understanding of the evolution of the animal kingdom. The largest shift has been from the widely held assumption of gradualism, whereby morphologically simpler animals such as flatworms were placed towards the base of the tree, and complex features such as coeloms and segments were thought to be homologous and to define major groups of animals higher up the tree. The tree widely accepted today has its roots firmly in Field et al.'s study, and subsequent studies adding to the sampling of small subunit (SSU) ribosomal RNA gene (rDNA) sequences; the major revolutions have, until recently, almost all come from efforts using SSU rDNA. Terms such as Ecdysozoa and Lophotrochozoa draw upon shared morphological features, but their roots stem from SSU rDNA. The new animal phylogeny, hand in hand with comparative developmental studies of homologous gene expression, has forced a reassessment of the evolution and homology of many characteristics of animals; a recognition of the pervasive effects of the loss of characters and secondary simplification (p.192) of body plans (Copley et al., 2004; Jenner, 2004c) as is apparent in the flatworms.
While there has been enormous progress in our understanding of metazoan phylogeny leading to broad agreement over the outline of the animal tree (Halanych, 2004; Telford, 2006), there remain a number of hotly contested questions in metazoan phylogeny; with inevitability, the outstanding questions are the hardest to answer and the difficulties encountered are likely to stem from multiple sources. The first major source of difficulty occurs when the living phyla emerged in an explosive radiation leaving little chance for the fixation of informative substitutions; such a situation is exemplified by the difficulty of resolving relationships between the lophotrochozoan clades (Dunn et al., 2008). The second important source of difficulty arises when living exemplars are the result of unusual patterns of genomic evolution that violate assumptions of models used to reconstruct trees, resulting in inaccuracies in their placement on the tree (Philippe and Telford, 2006). This is undoubtedly seen in the case of the acoel flatworms, chae tognaths, myzostomids, gnathostomulids, and various other ‘Problematica’.
The tendency for phylogeneticists to contradict each other over the placement of problematic groups may be rather frustrating to outsiders but is inevitable. First, all animals that have ever been described have also been positioned somewhere on a phylogenetic tree. Any progress to be made inevitably involves changing this position and hence introduces contradiction. Secondly, and alluded to above, all the easily solved aspects of the tree were answered 10 or 20 years ago, meaning anything currently worth studying is by definition problematic. A reliable phylogeny is fundamental to comparative biology and to our understanding of evolution, and progress continues.
The progress currently being made stems from the combination of four approaches; much larger data sets (phylogenomics) which avoid stochastic error from limited samples; data from additional representatives of problematic taxa to avoid or reduce systematic error; alternative sources of data (e.g. microRNAs) and, potentially, other rare genomic changes which it is hoped are resistant to homoplastic evolution (Rokas and Holland, 2000; Boore, 2006); and finally, improved methods of tree reconstruction that more accurately model the underlying process of molecular evolution so reducing further the possibility of stochastic error (Philippe and Telford, 2006). The biggest contributors to progress in terms of data are the new, cheap technologies for DNA sequencing. We are not far from the day when any given species (with a ‘normal’ sized genome) will have its genome completely sequenced for less than the sum that a single gene may have cost 25 years ago. This will provide the greatest possible source of data for phylogenetic analysis and the resolution of any remaining errors will be the province of the model makers.
The frustrations inherent in reconstructing the phylogeny of living animals are echoed by the problems of palaeontology. Many fossils are hard to decipher, especially for outsiders, and confusion is exacerbated by the vehement disagreements over their interpretation by the experts. As an example, the Lower Cambrian Emmonaspis cambrensis has been linked with graptolites (hemichordates), chord ates and arthropods, and even with Ediacaran frond-like organisms since its description in 1886 (Conway Morris, 1993b). Beyond the well-known problems of preservation and interpretation (Budd and Jensen, 2000), the most interesting fossils—those in the stem lineages of living taxa with the potential to show the order of acquisition of clade synapomorphies—are the hardest to interpret and to relate to modern groups by their very lack of synapomorphies.
Despite the undoubted problems of palaeontology, fossils are unique in their ability to inform us about certain aspects of evolution (Smith, 1994). While comparisons of living taxa within an accurate phylogenetic framework give tremendous insight into the pattern of evolution, this approach remains limited by the fact that most of the steps of evolution leading to living clades are absent. As an example, it seems clear that the closest relatives of the arthropods are to be found amongst the cycloneuralian worms. It is not clear, however, how much a comparison of priapulids and arthropods will tell us about the stages by which segments and (p.193) jointed appendages were acquired in the arthropod stem; in such a case, fossils can be of enormous importance.
The importance of studying fossil lineages for our understanding of the evolution of crown groups has been discussed. Stem-lineage fossils make an important contribution in several ways; they break long branches leading to crown groups and show intermediate character states; they may reveal unsuspected character homologies or indeed convergent evolution between extant groups; they can highlight character loss in certain groups; and, finally, they provide the sole means to calibrate evolutionary trees by giving minimum divergence times of living clades. Fossils are also able to provide the ecological background to specific evolutionary events, perhaps most spectacularly the great extinctions and the invasion of new habitats such as the land. All of this information is provided uniquely by fossils; it is vital that evolutionary biologists do not damn fossil evidence too readily based on the difficulties inherent in the field. Palaeontologists themselves recognize the problems they face, and efforts are being made to strengthen the objectivity of fossil interpretation and to understand the limits of inference; e.g. in calibrating trees (Drummond et al., 2006; Marshall, 2008), and the interpretation of biological evidence for historical events (Budd and Jensen, 2000; Domazet-Los et al., 2007; Peterson et al., 2007; Donoghue and Purnell, 2009). Newly discovered deposits, new tools to visualize internal and microscopic features, new methods of detecting and characterizing biomolecules, and simply returning repeatedly to problematic taxa in the light of new evidence will keep the study of fossils alive.
18.4 Developmental evolution
A phylogenetic tree can describe the relationships of species of living and fossil taxa; mapping the characteristics of those taxa onto the framework of the tree permits us to track the evolution of those characters, showing in which groups—and even at what time—key morphological novelties have evolved. While this combination of a dated phylogenetic framework and the distribution of characters provides a historical description or pattern of character evolution, to understand morphological novelty and how such morphological change has occurred at the level of the genome and the embryo (the process of morphological evolution) we need to study the genetics behind changes in ontogeny (see, for example, Moczek, 2008).
The birth of modern developmental evolutionary biology came 25 years ago with the molecular cloning of the homeobox motif from Drosophila homeotic genes (Carrasco et al., 1984; McGinnis et al., 1984) alongside the amazing discovery that the same motif (and indeed the same genes) existed in vertebrates with conserved functions. Comparative molecular genetic analyses of development have since changed our view of the evolution of developmental mechanisms and the origins of novel morphology, revealing surprising conservation and providing an alternative to phylogenetic proximity for determining homology. The promise of current evo-devo research is to expand the focus of research to new groups of organisms. While a great deal of progress continues to be made using comparisons of expression patterns (using in situ hybridization) for detecting similarity of function of homologous genes and identifying homology of characters, the export of genomics and true functional studies (e.g. RNA interference and transgenesis) to animals not previously considered model organisms is extremely exciting (see, for example, Abzhanov et al., 2008, and Vera et al., 2008).
By expanding beyond the traditional model organisms, practitioners of developmental evolutionary biology are able to build on the discoveries of the phylogeneticists and palaeontologists to address some of the more intriguing questions in morphological evolution. Current questions revealed by the new animal phylogeny and palaeontological discoveries include the origins of arthropods from the cycloneuralian worms such as priapulids and kinorhynchs, the unexpected relationship of the deuterostome-like brachiopods to lophotrochozoans such as annelids and molluscs, and the possible origins of bilaterians from animals resembling the acoel flatworms.
In addition to investigating specifics such as those questions mentioned above, another focus of developmental evolutionary studies is the generalities (p.194) of the genetics behind morphological evolution. A current debate concerns the relative importance of changes in regulatory DNA versus coding DNA of genes (Carroll, 2008; Stern and Orgogozo, 2008; Wagner and Lynch, 2008). One thing on which both sides seem to agree, however, and perhaps this realization is more fundamental than scoring points, is that changes of small effect predominate. Cis-regulatory changes are common due to the possibility of making subtle changes in independent enhancers, and coding changes occur where their pleiotropic effects are minimized. There is nothing new under the sun, however (Ecclesiastes 1:9–14), and this debate harks back, of course, to R. A. Fisher's analogy of the focusing of a microscope using small adjustments (Fisher, 1930).
18.5 Mind the gaps
Addressing what is missing in the study of animal evolution is unavoidable and necessary, not least because it demonstrates openness, attempts to define the limits of our knowledge, and indicates possible directions for future research. The influence of missing empirical information can be substantial, and assessing the impact of missing fossils, missing taxa, and missing data is almost a discipline itself in systematics. What is not known can influence estimates of tree topology and stability and the biological inferences we are prepared to make (see Wiens, 2006; Geuten et al., 2007; Fitzhugh, 2008). In phylogeny, should missing features be scored as losses or simply missing data, and when are multiple related missing features indicative of single losses (e.g. the deletion of strings of nucleotides or the loss of entire organs systems)? In palaeontology and evo-devo, when can absence of evidence be used as evidence of absence?
Incomplete information necessarily pushes us either towards caution, in the fear that any inferences from gappy data may be deemed premature, or towards bravery (perhaps even foolhardiness) as the constant need to take stock of available evidence forces phylogenetic estimates, character mapping, taxonomic revisions, recalibrated histories, and the desire to provide a narrative that explains biodiversity through space and time. Diligent researchers are keen to indicate the strength of their arguments by circumscribing the limits and possible influence of what is not known, at the risk of undermining any conclusions drawn from what is known. In contrast, selective sampling can provide more robust arguments and may obviate the need to consider uncertainty or less compelling scenarios. Though we do not set out to sample selectively, the nature of certain data sets puts us firmly at the mercy of exemplars. Just as the early days of SSU rDNA estimates of animal phylogeny relied on single taxa as representatives of entire phyla, we have seen phylogenomic analyses suffering from over-representation of taxonomically biased model organisms or unbalanced data sets as more or fewer expressed sequence tags (ESTs) are recruited for analysis from unrelated research. Using all available evidence from GenBank to estimate animal interrelationships would be cumbersome and unwise, but that is not to say we should not consider all the available data for statements on homology, and sample them for balanced representative data sets.
Balancing taxon and character sampling is difficult, and has been the focus of empirical and theoretical studies (e.g. Graybeal, 1998; Pollock et al., 2002), but there is little doubt that with each new data set we are liable to repeat the mistakes of insufficient or biased sampling. In many cases we simply do not know that our sampling is insufficient or biased, or may not be able address any shortfalls until new data sets become available. Many gaps in phylogenetic data sets await attention on key taxa for known characters that need to be scored. Meanwhile, expert morphologists and taxonomists are declining in number, character coding is frequently controversial, archival specimens may not be available or suitable for sampling the missing data, and the animals may be difficult to sample, being rare, cryptic, geographically isolated, elusive, or extinct. We need to live with gaps but also to recognize the need to address them when the opportunity arises.
The age of genomics arrived with the expectation that knowledge of complete genetic blueprints would provide a surfeit of phylogenetic information for robust tree reconstruction. This has yet to occur, since our efforts to uncover form, (p.195) function, and homology have been achieved for very few components of genomes (Kuzniar et al., 2008). For animal evolutionary biologists the era of post-genomics is a long way off, not just because of the lack of understanding of available genomes, but also because of the lack of characterized genomes themselves. Sampling systematically across the animal tree of life is an important strategy in developing comparative genomic data sets, but until now evolutionary biologists have rarely dictated sampling priorities. Furthermore, even a cursory look at the revolutions in molecular systematics show how sampling just a few key taxa can upset the entire understanding of animal evolution. For example, it was preliminary molecular systematic surveys of flatworms that highlighted the phylogenetic uniqueness of acoelomorph flatworms (Carranza et al., 1997; Littlewood et al., 1999) and that led ultimately to their current status, their distinctness from the Platyhelminthes and their importance as links to our deep bilaterian past (Baguñà et al., 2008; Hejnol and Martindale, 2008b). Undoubtedly, denser sampling of animal genomes will provide more surprises.
Whilst evolutionary biologists are constantly concerned with homology either implicitly or explicitly (see recent review by Szucsich and Wirkner, 2007), large-scale data sets are moving us away from an intimate understanding of all the statements of homology that we make or rely upon. To some, this may appear to be neglecting our responsibility as those whose task it is to detect, highlight, and interpret the evidence for shared ancestry. Recently there has been a shift from poring over nucleotide and amino acid alignments with reference to secondary structures, open reading frames, and function, where indels (insertion/deletion markers) might be placed judiciously and exclusion sets chosen carefully, to a need for automation in order to harness considerable volumes of data (Wong et al., 2008). A plethora of data requires the building and implementation of bioinformatic pipelines to make many of these decisions for us, swiftly, consistently (with given criteria), and routinely in the hope that we are minimizing noise and maximizing signal. Whilst these routines and algorithms might be borne of an understanding of the underlying data, such automated efforts do not negate the need to make evolutionary sense of the biological data, and we must be wary of opening new gaps in our understanding.
18.6 Learning from the past and taking advantage of the present
In an era dominated by unprecedented access to information, we have an opportunity for embracing considerable bodies of primary data, meta-data, and the thoughts and arguments of generations of researchers. Global efforts to digitize literature and specimens, internet tools that mine, parse, and link databases, and concerted global efforts by a generation of researchers willing to synthesize existing information are generating new understanding, whilst complementary efforts by others to generate primary data continue unabated. Indeed, the increase in rate at which gene sequence data can now be generated with second-generation sequencing is phenomenal, and third-generation sequencing, now on the horizon, promises orders of magnitude more data (Shendure and Ji, 2008). The information revolution is vast in scale and breadth and brings with it new powers and challenges, not least for bioinformaticians (Helaers et al., 2008; Pop and Salzberg, 2008). New ways of studying genomes and inferring historical events challenge underlying philosophies and resurrect arguments against phenetics, but there is little doubt that presence/absence of genes, gene networks and biochemical pathways, relative arrangement of genes, and so on, provide an entirely new vocabulary with which to consider the past (Boore, 2006; Ding et al., 2008; Dulith et al., 2008).
Although we strive for pragmatic approaches to the onslaught of information, and welcome the opportunities to bring disparate fields back into the fold, caution is always at the back of our minds. For example, although we might expect to be able to access information at the click of a mouse, at what point should we select the following without a second thought: a gene sequence with no associated voucher specimen, a distribution map based on inaccurate identifications or DNA barcodes, a tree topology based on data we have not seen, a cluster of genes we have not verified as being (p.196) orthologous, a supertree? Clearly, no individual can make all these decisions independently and it is as a community that we police ourselves, and the data we choose to accept as fit for purpose. Systematics continues to be about maximizing the signal and minimizing the noise, but there is a constant battle against a modern trend towards ‘one-gene-fits-all’ approaches, the undermining of systems that ‘ain't broke and don't need fixing’ (e.g. the Linnean system for classification), a lack of rigour in the understanding or implementing the tools (and underlying philosophies) of the trade, and false claims as to how we will have catalogued or barcoded every species on the planet and resolved the position of every twig of the tree of life within the next 25 years. Rhetoric aside, there has been no better time to study animal evolution.