MetaGeniE: Characterizing human clinical samples using deep metagenomic sequencing

Arun Rawat, David M. Engelthaler, Elizabeth M. Driebe, Paul Keim, Jeffrey T. Foster

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

With the decreasing cost of next-generation sequencing, deep sequencing of clinical samples provides unique opportunities to understand host-associated microbial communities. Among the primary challenges of clinical metagenomic sequencing is the rapid filtering of human reads to survey for pathogens with high specificity and sensitivity. Metagenomes are inherently variable due to different microbes in the samples and their relative abundance, the size and architecture of genomes, and factors such as target DNA amounts in tissue samples (i.e. human DNA versus pathogen DNA concentration). This variation in metagenomes typically manifests in sequencing datasets as low pathogen abundance, a high number of host reads, and the presence of close relatives and complex microbial communities. In addition to these challenges posed by the composition of metagenomes, high numbers of reads generated from highthroughput deep sequencing pose immense computational challenges. Accurate identification of pathogens is confounded by individual reads mapping to multiple different reference genomes due to gene similarity in different taxa present in the community or close relatives in the reference database. Available global and local sequence aligners also vary in sensitivity, specificity, and speed of detection. The efficiency of detection of pathogens in clinical samples is largely dependent on the desired taxonomic resolution of the organisms. We have developed an efficient strategy that identifies "all against all" relationships between sequencing reads and reference genomes. Our approach allows for scaling to large reference databases and then genome reconstruction by aggregating global and local alignments, thus allowing genetic characterization of pathogens at higher taxonomic resolution. These results were consistent with strain level SNP genotyping and bacterial identification from laboratory culture.

Original languageEnglish (US)
Article numbere110915
JournalPLoS One
Volume9
Issue number11
DOIs
StatePublished - Nov 3 2014

Fingerprint

Metagenome
High-Throughput Nucleotide Sequencing
Metagenomics
Pathogens
Genome
Genes
genome
DNA
Databases
microbial communities
pathogens
Genome Size
Sensitivity and Specificity
pathogen characterization
microbial detection
pathogen identification
sampling
Single Nucleotide Polymorphism
genotyping
Costs and Cost Analysis

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

MetaGeniE : Characterizing human clinical samples using deep metagenomic sequencing. / Rawat, Arun; Engelthaler, David M.; Driebe, Elizabeth M.; Keim, Paul; Foster, Jeffrey T.

In: PLoS One, Vol. 9, No. 11, e110915, 03.11.2014.

Research output: Contribution to journalArticle

Rawat, Arun ; Engelthaler, David M. ; Driebe, Elizabeth M. ; Keim, Paul ; Foster, Jeffrey T. / MetaGeniE : Characterizing human clinical samples using deep metagenomic sequencing. In: PLoS One. 2014 ; Vol. 9, No. 11.
@article{f4c514faa0824ae8a5d66f00954e2cfe,
title = "MetaGeniE: Characterizing human clinical samples using deep metagenomic sequencing",
abstract = "With the decreasing cost of next-generation sequencing, deep sequencing of clinical samples provides unique opportunities to understand host-associated microbial communities. Among the primary challenges of clinical metagenomic sequencing is the rapid filtering of human reads to survey for pathogens with high specificity and sensitivity. Metagenomes are inherently variable due to different microbes in the samples and their relative abundance, the size and architecture of genomes, and factors such as target DNA amounts in tissue samples (i.e. human DNA versus pathogen DNA concentration). This variation in metagenomes typically manifests in sequencing datasets as low pathogen abundance, a high number of host reads, and the presence of close relatives and complex microbial communities. In addition to these challenges posed by the composition of metagenomes, high numbers of reads generated from highthroughput deep sequencing pose immense computational challenges. Accurate identification of pathogens is confounded by individual reads mapping to multiple different reference genomes due to gene similarity in different taxa present in the community or close relatives in the reference database. Available global and local sequence aligners also vary in sensitivity, specificity, and speed of detection. The efficiency of detection of pathogens in clinical samples is largely dependent on the desired taxonomic resolution of the organisms. We have developed an efficient strategy that identifies {"}all against all{"} relationships between sequencing reads and reference genomes. Our approach allows for scaling to large reference databases and then genome reconstruction by aggregating global and local alignments, thus allowing genetic characterization of pathogens at higher taxonomic resolution. These results were consistent with strain level SNP genotyping and bacterial identification from laboratory culture.",
author = "Arun Rawat and Engelthaler, {David M.} and Driebe, {Elizabeth M.} and Paul Keim and Foster, {Jeffrey T.}",
year = "2014",
month = "11",
day = "3",
doi = "10.1371/journal.pone.0110915",
language = "English (US)",
volume = "9",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "11",

}

TY - JOUR

T1 - MetaGeniE

T2 - Characterizing human clinical samples using deep metagenomic sequencing

AU - Rawat, Arun

AU - Engelthaler, David M.

AU - Driebe, Elizabeth M.

AU - Keim, Paul

AU - Foster, Jeffrey T.

PY - 2014/11/3

Y1 - 2014/11/3

N2 - With the decreasing cost of next-generation sequencing, deep sequencing of clinical samples provides unique opportunities to understand host-associated microbial communities. Among the primary challenges of clinical metagenomic sequencing is the rapid filtering of human reads to survey for pathogens with high specificity and sensitivity. Metagenomes are inherently variable due to different microbes in the samples and their relative abundance, the size and architecture of genomes, and factors such as target DNA amounts in tissue samples (i.e. human DNA versus pathogen DNA concentration). This variation in metagenomes typically manifests in sequencing datasets as low pathogen abundance, a high number of host reads, and the presence of close relatives and complex microbial communities. In addition to these challenges posed by the composition of metagenomes, high numbers of reads generated from highthroughput deep sequencing pose immense computational challenges. Accurate identification of pathogens is confounded by individual reads mapping to multiple different reference genomes due to gene similarity in different taxa present in the community or close relatives in the reference database. Available global and local sequence aligners also vary in sensitivity, specificity, and speed of detection. The efficiency of detection of pathogens in clinical samples is largely dependent on the desired taxonomic resolution of the organisms. We have developed an efficient strategy that identifies "all against all" relationships between sequencing reads and reference genomes. Our approach allows for scaling to large reference databases and then genome reconstruction by aggregating global and local alignments, thus allowing genetic characterization of pathogens at higher taxonomic resolution. These results were consistent with strain level SNP genotyping and bacterial identification from laboratory culture.

AB - With the decreasing cost of next-generation sequencing, deep sequencing of clinical samples provides unique opportunities to understand host-associated microbial communities. Among the primary challenges of clinical metagenomic sequencing is the rapid filtering of human reads to survey for pathogens with high specificity and sensitivity. Metagenomes are inherently variable due to different microbes in the samples and their relative abundance, the size and architecture of genomes, and factors such as target DNA amounts in tissue samples (i.e. human DNA versus pathogen DNA concentration). This variation in metagenomes typically manifests in sequencing datasets as low pathogen abundance, a high number of host reads, and the presence of close relatives and complex microbial communities. In addition to these challenges posed by the composition of metagenomes, high numbers of reads generated from highthroughput deep sequencing pose immense computational challenges. Accurate identification of pathogens is confounded by individual reads mapping to multiple different reference genomes due to gene similarity in different taxa present in the community or close relatives in the reference database. Available global and local sequence aligners also vary in sensitivity, specificity, and speed of detection. The efficiency of detection of pathogens in clinical samples is largely dependent on the desired taxonomic resolution of the organisms. We have developed an efficient strategy that identifies "all against all" relationships between sequencing reads and reference genomes. Our approach allows for scaling to large reference databases and then genome reconstruction by aggregating global and local alignments, thus allowing genetic characterization of pathogens at higher taxonomic resolution. These results were consistent with strain level SNP genotyping and bacterial identification from laboratory culture.

UR - http://www.scopus.com/inward/record.url?scp=84909957860&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84909957860&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0110915

DO - 10.1371/journal.pone.0110915

M3 - Article

C2 - 25365329

AN - SCOPUS:84909957860

VL - 9

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 11

M1 - e110915

ER -