Sunday, January 21, 2007

 

The Evolution of Mammalian Gene Families

Excerpts from an open access PLoS ONE article plus a related news report and associated PNAS paper:

Summary

Gene families are groups of homologous genes that are likely to have highly similar functions. Differences in family size due to lineage-specific gene duplication and gene loss may provide clues to the evolutionary forces that have shaped mammalian genomes. Here we analyze the gene families contained within the whole genomes of human, chimpanzee, mouse, rat, and dog. In total we find that more than half of the 9,990 families present in the mammalian common ancestor have either expanded or contracted along at least one lineage. Additionally, we find that a large number of families are completely lost from one or more mammalian genomes, and a similar number of gene families have arisen subsequent to the mammalian common ancestor. Along the lineage leading to modern humans we infer the gain of 689 genes and the loss of 86 genes since the split from chimpanzees, including changes likely driven by adaptive natural selection. Our results imply that humans and chimpanzees differ by at least 6% (1,418 of 22,000 genes) in their complement of genes, which stands in stark contrast to the oft-cited 1.5% difference between orthologous nucleotide sequences. This genomic "revolving door" of gene gain and loss represents a large number of genetic differences separating humans from our closest relatives.

Introduction

Explaining the obvious morphological, physiological, and behavioral traits that separate modern humans from our closest relatives, the chimpanzees, is challenging given the low level of nucleotide divergence between the two species. More than 30 years have passed since King and Wilson ("Evolution at two levels in humans and chimpanzees") first pointed out this apparent paradox, saying that "the genetic distance between humans and the chimpanzee is probably too small to account for their substantial organismal differences". To explain the paradox, King and Wilson proposed that regulatory changes rather than protein-coding mutations were responsible for the vast majority of observed biological differences. Evidence gathered since that time demonstrates that amino acid and regulatory sequence changes have both been involved in the evolution of uniquely human phenotypes.

A third source of differentiation, necessarily overlooked in comparison of orthologous sequences, is the differential duplication and deletion of chromosomal regions. Among human segmental duplications larger than 20 kilobases, 33% are not present in chimpanzee. In total, it is estimated that at least 2.7% of the total genome has been uniquely duplicated subsequent to the human-chimpanzee split; this number does not factor either deletions or small insertions into the total amount of divergence and therefore represents a minimum estimate. Per base pair, this translates into more than twice as many nucleotides unique to each species as there are nucleotide substitutions in orthologous sequences. Without accounting for differences in the total DNA unique to each species, we cannot hope to take a proper accounting of the meaningful genetic divergence between humans and chimpanzees.

The most interesting duplication/deletion events from an evolutionary viewpoint are those that involve intact genes. Gene duplication has been hypothesized to be a powerful engine for evolutionary change in general, and gene loss has been put forward as a common, advantageous response to changes in selective regimes in human history. Recent gene duplicates are estimated to have arisen in the human genome at a rate of 0.009 /gene/million years (my). Using this rate, we would expect there to have been 1,188 new gene duplicates in the human genome since our split with chimpanzee (0.009 duplications/gene/my * 22,000 genes * 6 my). Assuming equal numbers of gene gains and losses and similar rates of turnover in chimps, the total number of genes in humans not present in chimps would be 2,376 (or approx 11% of all genes). This estimate of total genic divergence implied by rates of gene duplication has been widely overlooked due to the pervasive emphasis on nucleotide divergence between orthologous genes. Although this hypothesis assumes identical rates of gene gain and loss, and our coarse calculations have not considered that new gene duplicates are also the most likely genes to be lost, the consistency of gene number among fully sequenced mammals suggests that this is not an onerous assumption across short evolutionary time periods.

The process of differential gene gain and loss among species results in gene families that share sequence and functional homology but differ in gene number. Changes in gene family size have likely been important during human evolution and large differences in gene family size are generally ascribed to a selective advantage for either an increased or decreased gene number. While many of these differences may indeed be the result of natural selection, there has been little effort to account for the accumulation of differences due to random processes. For instance, a difference of 20 genes within a single family may be remarkable between human and chimpanzee, but not between human and mouse, or human and dog. Unlike the analysis of orthologous sequences, where there are widely accepted neutral expectations for molecular evolution, there has been no corresponding framework for the study of gene family evolution until recently.

The completed sequencing of multiple mammalian genomes provides unprecedented insight into the gain and loss of genetic material between species, and into the genomic changes exclusive to humans. In this paper we analyze gene gain and loss at a genomic scale by studying the expansion and contraction of gene families in the whole genomes of human, chimpanzee, mouse, rat, and dog. Using gene family assignments from the Ensembl project (version 41 - October 2006) we assign probabilities to the observed changes in gene family size along each mammalian lineage using a likelihood method that makes efficient use of genomic data in a phylogenetic context. Our statistical framework provides a basis for improved inferences about causative evolutionary mechanisms by providing an expectation for the extent of variation in gene family size when gains and losses occur randomly. This means that we can identify branches of the phylogenetic tree where larger-than-expected contractions or expansions potentially indicate the action of adaptive natural selection.

Our investigation suggests that random processes explain most changes in gene family size; however, we find several families with larger than expected changes, including expansions in the human lineage for families with brain-specific functions. Additionally, we find that the total number of gene differences between humans and chimps estimated by our method is similar to that predicted above from independent analyses of recent segmental duplications. In total, our results support mounting evidence that gene duplication and loss may have played a greater role than nucleotide substitution in the evolution of uniquely human phenotypes, and certainly a greater role than has been widely appreciated.

Full article available via the citation:

Demuth JP, Bie TD, Stajich JE, Cristianini N, Hahn MW (2006) The Evolution of Mammalian Gene Families. PLoS ONE 1(1): e85. doi:10.1371/journal.pone.0000085

-------

A BBC UK news report from September 2002:

Humans and chimps 'not so close'

...Most studies suggest that 98.5% of our genetic code can also be found in the chimp.

However, a study published in the journal Proceedings of the National Academy of Sciences says the true difference may be much larger.

In fact, say the researchers, only 95% of our DNA may be the same as the chimpanzee's.

Professor Roy Britten, of the California Institute of Technology, US, said that most studies did not take into account large sections of DNA which are not found on the genome of both man and chimp.

Based on the 2003 Proceedings of the National Academy of Sciences open access paper:

Abstract

Introduction

Mutations in the DNA are the source of variation in Darwinian evolution. Therefore it is likely that the examination of DNA differences between closely related species or among polymorphic variations in DNA of a given species will give insight into the nature of the mutations and the process of evolution. In the present paper, published and unpublished data are summarized for examples from several distantly related phylogenetic groups, and the data show that indels dominate the process of early divergence. There is a continuing problem in these data of the upper limit in the size of detected gaps and bias against larger ones. The groups sampled are apes (chimp-human DNA comparison), sea urchins (Strongylocentrotus purpuratus polymorphism), bacteria (Escherichia coli substrain comparison), insects (Drosophila polymorphism), nematodes (Caenorhabditis elegans polymorphism), and plants (Arabidopsis polymorphism). It is also noted that human genetic diseases are frequently caused by indels. The first part of the paper summarizes the results for samples of chimp DNA compared with the human genome sequence. Then an example of sea urchin polymorphism is briefly described. Initial comparison of two strains of E. coli O157:H7 is described. Finally, the published polymorphism data are reviewed and brought together with the data reported here to draw the conclusion that indel formation is a major and significant evolutionary process.

-------

Recent posts:

"Renegade RNA: 'It goes where no bit of it has gone before'"

"Picobiliphytes: A marine picoplanktonic algal group with unknown affinities to other Eukaroytes"

Technorati: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Add to: CiteUlike | Connotea | Del.icio.us | Digg | Furl | Newsvine | Reddit | Yahoo