Phylogenetic discovery bias in Bacillus anthracis using single-nucleotide polymorphisms from whole-genome sequencing

Talima R Pearson, Joseph D. Busch, Jacques Ravel, Timothy D. Read, Shane D. Rhoton, Jana M. U'Ren, Tatum S. Simonson, Sergey M. Kachur, Rebecca R. Leadem, Michelle L. Cardon, Matthew N. Van Ert, Lynn Y. Huynh, Claire M. Fraser, Paul S Keim

Research output: Contribution to journalArticle

194 Citations (Scopus)

Abstract

Phylogenetic reconstruction using molecular data is often subject to homoplasy, leading to inaccurate conclusions about phylogenetic relationships among operational taxonomic units. Compared with other molecular markers, single-nucleotide polymorphisms (SNPs) exhibit extremely low mutation rates, making them rare in recently emerged pathogens, but they are less prone to homoplasy and thus extremely valuable for phylogenetic analyses. Despite their phylogenetic potential, ascertainment bias occurs when SNP characters are discovered through biased taxonomic sampling; by using whole-genome comparisons of five diverse strains of Bacillus anthracis to facilitate SNP discovery, we show that only polymorphisms lying along the evolutionary pathway between reference strains will be observed. We illustrate this in theoretical and simulated data sets in which complex phylogenetic topologies are reduced to linear evolutionary models. Using a set of 990 SNP markers, we also show how divergent branches in our topologies collapse to single points but provide accurate information on internodal distances and points of origin for ancestral clades. These data allowed us to determine the ancestral root of B. anthracis, showing that it lies closer to a newly described "C" branch than to either of two previously described "A" or "B" branches. In addition, subclade rooting of the C branch revealed unequal evolutionary rates that seem to be correlated with ecological parameters and strain attributes. Our use of nonhomoplastic whole-genome SNP characters allows branch points and clade membership to be estimated with great precision, providing greater insight into epidemiological, ecological, and forensic questions.

Original languageEnglish (US)
Pages (from-to)13536-13541
Number of pages6
JournalProceedings of the National Academy of Sciences of the United States of America
Volume101
Issue number37
DOIs
StatePublished - Sep 14 2004

Fingerprint

Bacillus anthracis
Single Nucleotide Polymorphism
Genome
Mutation Rate
Linear Models

ASJC Scopus subject areas

  • Genetics
  • General

Cite this

Phylogenetic discovery bias in Bacillus anthracis using single-nucleotide polymorphisms from whole-genome sequencing. / Pearson, Talima R; Busch, Joseph D.; Ravel, Jacques; Read, Timothy D.; Rhoton, Shane D.; U'Ren, Jana M.; Simonson, Tatum S.; Kachur, Sergey M.; Leadem, Rebecca R.; Cardon, Michelle L.; Van Ert, Matthew N.; Huynh, Lynn Y.; Fraser, Claire M.; Keim, Paul S.

In: Proceedings of the National Academy of Sciences of the United States of America, Vol. 101, No. 37, 14.09.2004, p. 13536-13541.

Research output: Contribution to journalArticle

Pearson, TR, Busch, JD, Ravel, J, Read, TD, Rhoton, SD, U'Ren, JM, Simonson, TS, Kachur, SM, Leadem, RR, Cardon, ML, Van Ert, MN, Huynh, LY, Fraser, CM & Keim, PS 2004, 'Phylogenetic discovery bias in Bacillus anthracis using single-nucleotide polymorphisms from whole-genome sequencing', Proceedings of the National Academy of Sciences of the United States of America, vol. 101, no. 37, pp. 13536-13541. https://doi.org/10.1073/pnas.0403844101
Pearson, Talima R ; Busch, Joseph D. ; Ravel, Jacques ; Read, Timothy D. ; Rhoton, Shane D. ; U'Ren, Jana M. ; Simonson, Tatum S. ; Kachur, Sergey M. ; Leadem, Rebecca R. ; Cardon, Michelle L. ; Van Ert, Matthew N. ; Huynh, Lynn Y. ; Fraser, Claire M. ; Keim, Paul S. / Phylogenetic discovery bias in Bacillus anthracis using single-nucleotide polymorphisms from whole-genome sequencing. In: Proceedings of the National Academy of Sciences of the United States of America. 2004 ; Vol. 101, No. 37. pp. 13536-13541.
@article{1e5891f83a6c4c1bacd9982f9fb2972a,
title = "Phylogenetic discovery bias in Bacillus anthracis using single-nucleotide polymorphisms from whole-genome sequencing",
abstract = "Phylogenetic reconstruction using molecular data is often subject to homoplasy, leading to inaccurate conclusions about phylogenetic relationships among operational taxonomic units. Compared with other molecular markers, single-nucleotide polymorphisms (SNPs) exhibit extremely low mutation rates, making them rare in recently emerged pathogens, but they are less prone to homoplasy and thus extremely valuable for phylogenetic analyses. Despite their phylogenetic potential, ascertainment bias occurs when SNP characters are discovered through biased taxonomic sampling; by using whole-genome comparisons of five diverse strains of Bacillus anthracis to facilitate SNP discovery, we show that only polymorphisms lying along the evolutionary pathway between reference strains will be observed. We illustrate this in theoretical and simulated data sets in which complex phylogenetic topologies are reduced to linear evolutionary models. Using a set of 990 SNP markers, we also show how divergent branches in our topologies collapse to single points but provide accurate information on internodal distances and points of origin for ancestral clades. These data allowed us to determine the ancestral root of B. anthracis, showing that it lies closer to a newly described {"}C{"} branch than to either of two previously described {"}A{"} or {"}B{"} branches. In addition, subclade rooting of the C branch revealed unequal evolutionary rates that seem to be correlated with ecological parameters and strain attributes. Our use of nonhomoplastic whole-genome SNP characters allows branch points and clade membership to be estimated with great precision, providing greater insight into epidemiological, ecological, and forensic questions.",
author = "Pearson, {Talima R} and Busch, {Joseph D.} and Jacques Ravel and Read, {Timothy D.} and Rhoton, {Shane D.} and U'Ren, {Jana M.} and Simonson, {Tatum S.} and Kachur, {Sergey M.} and Leadem, {Rebecca R.} and Cardon, {Michelle L.} and {Van Ert}, {Matthew N.} and Huynh, {Lynn Y.} and Fraser, {Claire M.} and Keim, {Paul S}",
year = "2004",
month = "9",
day = "14",
doi = "10.1073/pnas.0403844101",
language = "English (US)",
volume = "101",
pages = "13536--13541",
journal = "Proceedings of the National Academy of Sciences of the United States of America",
issn = "0027-8424",
number = "37",

}

TY - JOUR

T1 - Phylogenetic discovery bias in Bacillus anthracis using single-nucleotide polymorphisms from whole-genome sequencing

AU - Pearson, Talima R

AU - Busch, Joseph D.

AU - Ravel, Jacques

AU - Read, Timothy D.

AU - Rhoton, Shane D.

AU - U'Ren, Jana M.

AU - Simonson, Tatum S.

AU - Kachur, Sergey M.

AU - Leadem, Rebecca R.

AU - Cardon, Michelle L.

AU - Van Ert, Matthew N.

AU - Huynh, Lynn Y.

AU - Fraser, Claire M.

AU - Keim, Paul S

PY - 2004/9/14

Y1 - 2004/9/14

N2 - Phylogenetic reconstruction using molecular data is often subject to homoplasy, leading to inaccurate conclusions about phylogenetic relationships among operational taxonomic units. Compared with other molecular markers, single-nucleotide polymorphisms (SNPs) exhibit extremely low mutation rates, making them rare in recently emerged pathogens, but they are less prone to homoplasy and thus extremely valuable for phylogenetic analyses. Despite their phylogenetic potential, ascertainment bias occurs when SNP characters are discovered through biased taxonomic sampling; by using whole-genome comparisons of five diverse strains of Bacillus anthracis to facilitate SNP discovery, we show that only polymorphisms lying along the evolutionary pathway between reference strains will be observed. We illustrate this in theoretical and simulated data sets in which complex phylogenetic topologies are reduced to linear evolutionary models. Using a set of 990 SNP markers, we also show how divergent branches in our topologies collapse to single points but provide accurate information on internodal distances and points of origin for ancestral clades. These data allowed us to determine the ancestral root of B. anthracis, showing that it lies closer to a newly described "C" branch than to either of two previously described "A" or "B" branches. In addition, subclade rooting of the C branch revealed unequal evolutionary rates that seem to be correlated with ecological parameters and strain attributes. Our use of nonhomoplastic whole-genome SNP characters allows branch points and clade membership to be estimated with great precision, providing greater insight into epidemiological, ecological, and forensic questions.

AB - Phylogenetic reconstruction using molecular data is often subject to homoplasy, leading to inaccurate conclusions about phylogenetic relationships among operational taxonomic units. Compared with other molecular markers, single-nucleotide polymorphisms (SNPs) exhibit extremely low mutation rates, making them rare in recently emerged pathogens, but they are less prone to homoplasy and thus extremely valuable for phylogenetic analyses. Despite their phylogenetic potential, ascertainment bias occurs when SNP characters are discovered through biased taxonomic sampling; by using whole-genome comparisons of five diverse strains of Bacillus anthracis to facilitate SNP discovery, we show that only polymorphisms lying along the evolutionary pathway between reference strains will be observed. We illustrate this in theoretical and simulated data sets in which complex phylogenetic topologies are reduced to linear evolutionary models. Using a set of 990 SNP markers, we also show how divergent branches in our topologies collapse to single points but provide accurate information on internodal distances and points of origin for ancestral clades. These data allowed us to determine the ancestral root of B. anthracis, showing that it lies closer to a newly described "C" branch than to either of two previously described "A" or "B" branches. In addition, subclade rooting of the C branch revealed unequal evolutionary rates that seem to be correlated with ecological parameters and strain attributes. Our use of nonhomoplastic whole-genome SNP characters allows branch points and clade membership to be estimated with great precision, providing greater insight into epidemiological, ecological, and forensic questions.

UR - http://www.scopus.com/inward/record.url?scp=4544319040&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=4544319040&partnerID=8YFLogxK

U2 - 10.1073/pnas.0403844101

DO - 10.1073/pnas.0403844101

M3 - Article

C2 - 15347815

AN - SCOPUS:4544319040

VL - 101

SP - 13536

EP - 13541

JO - Proceedings of the National Academy of Sciences of the United States of America

JF - Proceedings of the National Academy of Sciences of the United States of America

SN - 0027-8424

IS - 37

ER -