Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin

Nicholas A. Bokulich, Benjamin D. Kaehler, Jai Ram Rideout, Matthew Dillon, Evan Bolyen, Rob Knight, Gavin A. Huttley, James G Caporaso

Research output: Contribution to journalArticle

62 Citations (Scopus)

Abstract

Background: Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. Results: We present q2-feature-classifier (https://github.com/qiime2/q2-feature-classifier), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit (https://github.com/caporaso-lab/tax-credit-data). Conclusions: Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.

Original languageEnglish (US)
Article number90
JournalMicrobiome
Volume6
Issue number1
DOIs
StatePublished - May 17 2018

Fingerprint

Genes
Taxes
Benchmarking
Microbiota
Machine Learning

ASJC Scopus subject areas

  • Microbiology
  • Microbiology (medical)

Cite this

Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin. / Bokulich, Nicholas A.; Kaehler, Benjamin D.; Rideout, Jai Ram; Dillon, Matthew; Bolyen, Evan; Knight, Rob; Huttley, Gavin A.; Caporaso, James G.

In: Microbiome, Vol. 6, No. 1, 90, 17.05.2018.

Research output: Contribution to journalArticle

Bokulich, Nicholas A. ; Kaehler, Benjamin D. ; Rideout, Jai Ram ; Dillon, Matthew ; Bolyen, Evan ; Knight, Rob ; Huttley, Gavin A. ; Caporaso, James G. / Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin. In: Microbiome. 2018 ; Vol. 6, No. 1.
@article{c844697b124e4dd5ade880a89dac8708,
title = "Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin",
abstract = "Background: Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. Results: We present q2-feature-classifier (https://github.com/qiime2/q2-feature-classifier), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated {"}novel{"} marker-gene sequences, are available in our extensible benchmarking framework, tax-credit (https://github.com/caporaso-lab/tax-credit-data). Conclusions: Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.",
author = "Bokulich, {Nicholas A.} and Kaehler, {Benjamin D.} and Rideout, {Jai Ram} and Matthew Dillon and Evan Bolyen and Rob Knight and Huttley, {Gavin A.} and Caporaso, {James G}",
year = "2018",
month = "5",
day = "17",
doi = "10.1186/s40168-018-0470-z",
language = "English (US)",
volume = "6",
journal = "Microbiome",
issn = "2049-2618",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin

AU - Bokulich, Nicholas A.

AU - Kaehler, Benjamin D.

AU - Rideout, Jai Ram

AU - Dillon, Matthew

AU - Bolyen, Evan

AU - Knight, Rob

AU - Huttley, Gavin A.

AU - Caporaso, James G

PY - 2018/5/17

Y1 - 2018/5/17

N2 - Background: Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. Results: We present q2-feature-classifier (https://github.com/qiime2/q2-feature-classifier), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit (https://github.com/caporaso-lab/tax-credit-data). Conclusions: Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.

AB - Background: Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. Results: We present q2-feature-classifier (https://github.com/qiime2/q2-feature-classifier), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit (https://github.com/caporaso-lab/tax-credit-data). Conclusions: Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.

UR - http://www.scopus.com/inward/record.url?scp=85052651259&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85052651259&partnerID=8YFLogxK

U2 - 10.1186/s40168-018-0470-z

DO - 10.1186/s40168-018-0470-z

M3 - Article

VL - 6

JO - Microbiome

JF - Microbiome

SN - 2049-2618

IS - 1

M1 - 90

ER -