Best practices for evaluating single nucleotide variant calling methods for microbial genomics

Nathan D. Olson, Steven P. Lund, Rebecca E. Colman, Jeffrey T Foster, Jason W. Sahl, James M. Schupp, Paul S Keim, Jayne B. Morrow, Marc L. Salit, Justin M. Zook

Research output: Contribution to journalArticle

45 Citations (Scopus)

Abstract

Innovations in sequencing technologies have allowed biologists to make incredible advances in understanding biological systems. As experience grows, researchers increasingly recognize that analyzing the wealth of data provided by these new sequencing platforms requires careful attention to detail for robust results. Thus far, much of the scientific Communit's focus for use in bacterial genomics has been on evaluating genome assembly algorithms and rigorously validating assembly program performance. Missing, however, is a focus on critical evaluation of variant callers for these genomes. Variant calling is essential for comparative genomics as it yields insights into nucleotide-level organismal differences. Variant calling is a multistep process with a host of potential error sources that may lead to incorrect variant calls. Identifying and resolving these incorrect calls is critical for bacterial genomics to advance. The goal of this review is to provide guidance on validating algorithms and pipelines used in variant calling for bacterial genomics. First, we will provide an overview of the variant calling procedures and the potential sources of error associated with the methods. We will then identify appropriate datasets for use in evaluating algorithms and describe statistical methods for evaluating algorithm performance. As variant calling moves from basic research to the applied setting, standardized methods for performance evaluation and reporting are required; it is our hope that this review provides the groundwork for the development of these standards.

Original languageEnglish (US)
Article number235
JournalFrontiers in Genetics
Volume6
Issue numberJUL
DOIs
StatePublished - 2015

Fingerprint

Genomics
Practice Guidelines
Nucleotides
Research Design
Genome
Research Personnel
Technology
Research

Keywords

  • Indel
  • Next-generation sequencing
  • Performance metrics
  • Single nucleotide variants
  • Variant calling

ASJC Scopus subject areas

  • Genetics
  • Molecular Medicine
  • Genetics(clinical)

Cite this

Best practices for evaluating single nucleotide variant calling methods for microbial genomics. / Olson, Nathan D.; Lund, Steven P.; Colman, Rebecca E.; Foster, Jeffrey T; Sahl, Jason W.; Schupp, James M.; Keim, Paul S; Morrow, Jayne B.; Salit, Marc L.; Zook, Justin M.

In: Frontiers in Genetics, Vol. 6, No. JUL, 235, 2015.

Research output: Contribution to journalArticle

Olson, ND, Lund, SP, Colman, RE, Foster, JT, Sahl, JW, Schupp, JM, Keim, PS, Morrow, JB, Salit, ML & Zook, JM 2015, 'Best practices for evaluating single nucleotide variant calling methods for microbial genomics', Frontiers in Genetics, vol. 6, no. JUL, 235. https://doi.org/10.3389/fgene.2015.00235
Olson, Nathan D. ; Lund, Steven P. ; Colman, Rebecca E. ; Foster, Jeffrey T ; Sahl, Jason W. ; Schupp, James M. ; Keim, Paul S ; Morrow, Jayne B. ; Salit, Marc L. ; Zook, Justin M. / Best practices for evaluating single nucleotide variant calling methods for microbial genomics. In: Frontiers in Genetics. 2015 ; Vol. 6, No. JUL.
@article{12c12869d6124039b8ebb6b5127cd130,
title = "Best practices for evaluating single nucleotide variant calling methods for microbial genomics",
abstract = "Innovations in sequencing technologies have allowed biologists to make incredible advances in understanding biological systems. As experience grows, researchers increasingly recognize that analyzing the wealth of data provided by these new sequencing platforms requires careful attention to detail for robust results. Thus far, much of the scientific Communit's focus for use in bacterial genomics has been on evaluating genome assembly algorithms and rigorously validating assembly program performance. Missing, however, is a focus on critical evaluation of variant callers for these genomes. Variant calling is essential for comparative genomics as it yields insights into nucleotide-level organismal differences. Variant calling is a multistep process with a host of potential error sources that may lead to incorrect variant calls. Identifying and resolving these incorrect calls is critical for bacterial genomics to advance. The goal of this review is to provide guidance on validating algorithms and pipelines used in variant calling for bacterial genomics. First, we will provide an overview of the variant calling procedures and the potential sources of error associated with the methods. We will then identify appropriate datasets for use in evaluating algorithms and describe statistical methods for evaluating algorithm performance. As variant calling moves from basic research to the applied setting, standardized methods for performance evaluation and reporting are required; it is our hope that this review provides the groundwork for the development of these standards.",
keywords = "Indel, Next-generation sequencing, Performance metrics, Single nucleotide variants, Variant calling",
author = "Olson, {Nathan D.} and Lund, {Steven P.} and Colman, {Rebecca E.} and Foster, {Jeffrey T} and Sahl, {Jason W.} and Schupp, {James M.} and Keim, {Paul S} and Morrow, {Jayne B.} and Salit, {Marc L.} and Zook, {Justin M.}",
year = "2015",
doi = "10.3389/fgene.2015.00235",
language = "English (US)",
volume = "6",
journal = "Frontiers in Genetics",
issn = "1664-8021",
publisher = "Frontiers Media S. A.",
number = "JUL",

}

TY - JOUR

T1 - Best practices for evaluating single nucleotide variant calling methods for microbial genomics

AU - Olson, Nathan D.

AU - Lund, Steven P.

AU - Colman, Rebecca E.

AU - Foster, Jeffrey T

AU - Sahl, Jason W.

AU - Schupp, James M.

AU - Keim, Paul S

AU - Morrow, Jayne B.

AU - Salit, Marc L.

AU - Zook, Justin M.

PY - 2015

Y1 - 2015

N2 - Innovations in sequencing technologies have allowed biologists to make incredible advances in understanding biological systems. As experience grows, researchers increasingly recognize that analyzing the wealth of data provided by these new sequencing platforms requires careful attention to detail for robust results. Thus far, much of the scientific Communit's focus for use in bacterial genomics has been on evaluating genome assembly algorithms and rigorously validating assembly program performance. Missing, however, is a focus on critical evaluation of variant callers for these genomes. Variant calling is essential for comparative genomics as it yields insights into nucleotide-level organismal differences. Variant calling is a multistep process with a host of potential error sources that may lead to incorrect variant calls. Identifying and resolving these incorrect calls is critical for bacterial genomics to advance. The goal of this review is to provide guidance on validating algorithms and pipelines used in variant calling for bacterial genomics. First, we will provide an overview of the variant calling procedures and the potential sources of error associated with the methods. We will then identify appropriate datasets for use in evaluating algorithms and describe statistical methods for evaluating algorithm performance. As variant calling moves from basic research to the applied setting, standardized methods for performance evaluation and reporting are required; it is our hope that this review provides the groundwork for the development of these standards.

AB - Innovations in sequencing technologies have allowed biologists to make incredible advances in understanding biological systems. As experience grows, researchers increasingly recognize that analyzing the wealth of data provided by these new sequencing platforms requires careful attention to detail for robust results. Thus far, much of the scientific Communit's focus for use in bacterial genomics has been on evaluating genome assembly algorithms and rigorously validating assembly program performance. Missing, however, is a focus on critical evaluation of variant callers for these genomes. Variant calling is essential for comparative genomics as it yields insights into nucleotide-level organismal differences. Variant calling is a multistep process with a host of potential error sources that may lead to incorrect variant calls. Identifying and resolving these incorrect calls is critical for bacterial genomics to advance. The goal of this review is to provide guidance on validating algorithms and pipelines used in variant calling for bacterial genomics. First, we will provide an overview of the variant calling procedures and the potential sources of error associated with the methods. We will then identify appropriate datasets for use in evaluating algorithms and describe statistical methods for evaluating algorithm performance. As variant calling moves from basic research to the applied setting, standardized methods for performance evaluation and reporting are required; it is our hope that this review provides the groundwork for the development of these standards.

KW - Indel

KW - Next-generation sequencing

KW - Performance metrics

KW - Single nucleotide variants

KW - Variant calling

UR - http://www.scopus.com/inward/record.url?scp=84940106715&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84940106715&partnerID=8YFLogxK

U2 - 10.3389/fgene.2015.00235

DO - 10.3389/fgene.2015.00235

M3 - Article

AN - SCOPUS:84940106715

VL - 6

JO - Frontiers in Genetics

JF - Frontiers in Genetics

SN - 1664-8021

IS - JUL

M1 - 235

ER -