Abstract
Whole-genome sequencing (WGS) of bacterial isolates has become standard practice in many laboratories. Applications for WGS analysis include phylogeography and molecular epidemiology, using single nucleotide polymorphisms (SNPs) as the unit of evolution. NASP was developed as a reproducible method that scales well with the hundreds to thousands of WGS data typically used in comparative genomics applications. In this study, we demonstrate how NASP compares with other tools in the analysis of two real bacterial genomics datasets and one simulated dataset. Our results demonstrate that NASP produces similar, and often better, results in comparison with other pipelines, but is much more flexible in terms of data input types, job management systems, diversity of supported tools and output formats. We also demonstrate differences in results based on the choice of the reference genome and choice of inferring phylogenies from concatenated SNPs or alignments including monomorphic positions. NASP represents a source-available, version-controlled, unit-tested method and can be obtained from tgennorth.github.io/NASP.
Original language | English (US) |
---|---|
Pages (from-to) | e000074 |
Journal | Microbial genomics |
Volume | 2 |
Issue number | 8 |
DOIs | |
State | Published - Aug 1 2016 |
Fingerprint
Keywords
- bioinformatics
- Phylogeography
- SNPs
ASJC Scopus subject areas
- Medicine(all)
Cite this
NASP : an accurate, rapid method for the identification of SNPs in WGS datasets that supports flexible input and output formats. / Sahl, Jason W.; Lemmer, Darrin; Travis, Jason; Schupp, James M.; Gillece, John D.; Aziz, Maliha; Driebe, Elizabeth M.; Drees, Kevin P.; Hicks, Nathan D.; Williamson, Charles Hall Davis; Hepp, Crystal M.; Smith, David Earl; Roe, Chandler; Engelthaler, David M.; Wagner, David M; Keim, Paul S.
In: Microbial genomics, Vol. 2, No. 8, 01.08.2016, p. e000074.Research output: Contribution to journal › Article
}
TY - JOUR
T1 - NASP
T2 - an accurate, rapid method for the identification of SNPs in WGS datasets that supports flexible input and output formats
AU - Sahl, Jason W.
AU - Lemmer, Darrin
AU - Travis, Jason
AU - Schupp, James M.
AU - Gillece, John D.
AU - Aziz, Maliha
AU - Driebe, Elizabeth M.
AU - Drees, Kevin P.
AU - Hicks, Nathan D.
AU - Williamson, Charles Hall Davis
AU - Hepp, Crystal M.
AU - Smith, David Earl
AU - Roe, Chandler
AU - Engelthaler, David M.
AU - Wagner, David M
AU - Keim, Paul S
PY - 2016/8/1
Y1 - 2016/8/1
N2 - Whole-genome sequencing (WGS) of bacterial isolates has become standard practice in many laboratories. Applications for WGS analysis include phylogeography and molecular epidemiology, using single nucleotide polymorphisms (SNPs) as the unit of evolution. NASP was developed as a reproducible method that scales well with the hundreds to thousands of WGS data typically used in comparative genomics applications. In this study, we demonstrate how NASP compares with other tools in the analysis of two real bacterial genomics datasets and one simulated dataset. Our results demonstrate that NASP produces similar, and often better, results in comparison with other pipelines, but is much more flexible in terms of data input types, job management systems, diversity of supported tools and output formats. We also demonstrate differences in results based on the choice of the reference genome and choice of inferring phylogenies from concatenated SNPs or alignments including monomorphic positions. NASP represents a source-available, version-controlled, unit-tested method and can be obtained from tgennorth.github.io/NASP.
AB - Whole-genome sequencing (WGS) of bacterial isolates has become standard practice in many laboratories. Applications for WGS analysis include phylogeography and molecular epidemiology, using single nucleotide polymorphisms (SNPs) as the unit of evolution. NASP was developed as a reproducible method that scales well with the hundreds to thousands of WGS data typically used in comparative genomics applications. In this study, we demonstrate how NASP compares with other tools in the analysis of two real bacterial genomics datasets and one simulated dataset. Our results demonstrate that NASP produces similar, and often better, results in comparison with other pipelines, but is much more flexible in terms of data input types, job management systems, diversity of supported tools and output formats. We also demonstrate differences in results based on the choice of the reference genome and choice of inferring phylogenies from concatenated SNPs or alignments including monomorphic positions. NASP represents a source-available, version-controlled, unit-tested method and can be obtained from tgennorth.github.io/NASP.
KW - bioinformatics
KW - Phylogeography
KW - SNPs
UR - http://www.scopus.com/inward/record.url?scp=85045985271&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85045985271&partnerID=8YFLogxK
U2 - 10.1099/mgen.0.000074
DO - 10.1099/mgen.0.000074
M3 - Article
C2 - 28348869
AN - SCOPUS:85045985271
VL - 2
SP - e000074
JO - Microbial genomics
JF - Microbial genomics
SN - 2057-5858
IS - 8
ER -