Protein abundances can distinguish between naturally-occurring and laboratory strains of Yersinia pestis, the causative agent of plague

Eric D. Merkley, Landon H. Sego, Andy Lin, Owen P. Leiser, Brooke L.Deatherage Kaiser, Joshua N. Adkins, Paul S Keim, David M Wagner, Helen W. Kreuzer

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

The rapid pace of bacterial evolution enables organisms to adapt to the laboratory environment with repeated passage and thus diverge from naturally-occurring environmental (“wild”) strains. Distinguishing wild and laboratory strains is clearly important for biodefense and bioforensics; however, DNA sequence data alone has thus far not provided a clear signature, perhaps due to lack of understanding of how diverse genome changes lead to convergent phenotypes, difficulty in detecting certain types of mutations, or perhaps because some adaptive modifications are epigenetic. Monitoring protein abundance, a molecular measure of phenotype, can overcome some of these difficulties. We have assembled a collection of Yersinia pestis proteomics datasets from our own published and unpublished work, and from a proteomics data archive, and demonstrated that protein abundance data can clearly distinguish laboratory-adapted from wild. We developed a lasso logistic regression classifier that uses binary (presence/absence) or quantitative protein abundance measures to predict whether a sample is laboratory-adapted or wild that proved to be ~98% accurate, as judged by replicated 10-fold cross-validation. Protein features selected by the classifier accord well with our previous study of laboratory adaptation in Y. pestis. The input data was derived from a variety of unrelated experiments and contained significant confounding variables. We show that the classifier is robust with respect to these variables. The methodology is able to discover signatures for laboratory facility and culture medium that are largely independent of the signature of laboratory adaptation. Going beyond our previous laboratory evolution study, this work suggests that proteomic differences between laboratory-adapted and wild Y. pestis are general, potentially pointing to a process that could apply to other species as well. Additionally, we show that proteomics datasets (even archived data collected for different purposes) contain the information necessary to distinguish wild and laboratory samples. This work has clear applications in biomarker detection as well as biodefense.

Original languageEnglish (US)
Article numbere0183478
JournalPLoS One
Volume12
Issue number8
DOIs
StatePublished - Aug 1 2017

Fingerprint

Yersinia pestis
Plague
plague
Proteins
proteins
Proteomics
proteomics
Classifiers
Phenotype
phenotype
Confounding Factors (Epidemiology)
DNA sequences
Biomarkers
Epigenomics
epigenetics
Culture Media
Logistics
biomarkers
culture media
Genes

ASJC Scopus subject areas

  • Medicine(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

Merkley, E. D., Sego, L. H., Lin, A., Leiser, O. P., Kaiser, B. L. D., Adkins, J. N., ... Kreuzer, H. W. (2017). Protein abundances can distinguish between naturally-occurring and laboratory strains of Yersinia pestis, the causative agent of plague. PLoS One, 12(8), [e0183478]. https://doi.org/10.1371/journal.pone.0183478

Protein abundances can distinguish between naturally-occurring and laboratory strains of Yersinia pestis, the causative agent of plague. / Merkley, Eric D.; Sego, Landon H.; Lin, Andy; Leiser, Owen P.; Kaiser, Brooke L.Deatherage; Adkins, Joshua N.; Keim, Paul S; Wagner, David M; Kreuzer, Helen W.

In: PLoS One, Vol. 12, No. 8, e0183478, 01.08.2017.

Research output: Contribution to journalArticle

Merkley, Eric D. ; Sego, Landon H. ; Lin, Andy ; Leiser, Owen P. ; Kaiser, Brooke L.Deatherage ; Adkins, Joshua N. ; Keim, Paul S ; Wagner, David M ; Kreuzer, Helen W. / Protein abundances can distinguish between naturally-occurring and laboratory strains of Yersinia pestis, the causative agent of plague. In: PLoS One. 2017 ; Vol. 12, No. 8.
@article{66a54f49fe364dffad77c45c224ceee3,
title = "Protein abundances can distinguish between naturally-occurring and laboratory strains of Yersinia pestis, the causative agent of plague",
abstract = "The rapid pace of bacterial evolution enables organisms to adapt to the laboratory environment with repeated passage and thus diverge from naturally-occurring environmental (“wild”) strains. Distinguishing wild and laboratory strains is clearly important for biodefense and bioforensics; however, DNA sequence data alone has thus far not provided a clear signature, perhaps due to lack of understanding of how diverse genome changes lead to convergent phenotypes, difficulty in detecting certain types of mutations, or perhaps because some adaptive modifications are epigenetic. Monitoring protein abundance, a molecular measure of phenotype, can overcome some of these difficulties. We have assembled a collection of Yersinia pestis proteomics datasets from our own published and unpublished work, and from a proteomics data archive, and demonstrated that protein abundance data can clearly distinguish laboratory-adapted from wild. We developed a lasso logistic regression classifier that uses binary (presence/absence) or quantitative protein abundance measures to predict whether a sample is laboratory-adapted or wild that proved to be ~98{\%} accurate, as judged by replicated 10-fold cross-validation. Protein features selected by the classifier accord well with our previous study of laboratory adaptation in Y. pestis. The input data was derived from a variety of unrelated experiments and contained significant confounding variables. We show that the classifier is robust with respect to these variables. The methodology is able to discover signatures for laboratory facility and culture medium that are largely independent of the signature of laboratory adaptation. Going beyond our previous laboratory evolution study, this work suggests that proteomic differences between laboratory-adapted and wild Y. pestis are general, potentially pointing to a process that could apply to other species as well. Additionally, we show that proteomics datasets (even archived data collected for different purposes) contain the information necessary to distinguish wild and laboratory samples. This work has clear applications in biomarker detection as well as biodefense.",
author = "Merkley, {Eric D.} and Sego, {Landon H.} and Andy Lin and Leiser, {Owen P.} and Kaiser, {Brooke L.Deatherage} and Adkins, {Joshua N.} and Keim, {Paul S} and Wagner, {David M} and Kreuzer, {Helen W.}",
year = "2017",
month = "8",
day = "1",
doi = "10.1371/journal.pone.0183478",
language = "English (US)",
volume = "12",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "8",

}

TY - JOUR

T1 - Protein abundances can distinguish between naturally-occurring and laboratory strains of Yersinia pestis, the causative agent of plague

AU - Merkley, Eric D.

AU - Sego, Landon H.

AU - Lin, Andy

AU - Leiser, Owen P.

AU - Kaiser, Brooke L.Deatherage

AU - Adkins, Joshua N.

AU - Keim, Paul S

AU - Wagner, David M

AU - Kreuzer, Helen W.

PY - 2017/8/1

Y1 - 2017/8/1

N2 - The rapid pace of bacterial evolution enables organisms to adapt to the laboratory environment with repeated passage and thus diverge from naturally-occurring environmental (“wild”) strains. Distinguishing wild and laboratory strains is clearly important for biodefense and bioforensics; however, DNA sequence data alone has thus far not provided a clear signature, perhaps due to lack of understanding of how diverse genome changes lead to convergent phenotypes, difficulty in detecting certain types of mutations, or perhaps because some adaptive modifications are epigenetic. Monitoring protein abundance, a molecular measure of phenotype, can overcome some of these difficulties. We have assembled a collection of Yersinia pestis proteomics datasets from our own published and unpublished work, and from a proteomics data archive, and demonstrated that protein abundance data can clearly distinguish laboratory-adapted from wild. We developed a lasso logistic regression classifier that uses binary (presence/absence) or quantitative protein abundance measures to predict whether a sample is laboratory-adapted or wild that proved to be ~98% accurate, as judged by replicated 10-fold cross-validation. Protein features selected by the classifier accord well with our previous study of laboratory adaptation in Y. pestis. The input data was derived from a variety of unrelated experiments and contained significant confounding variables. We show that the classifier is robust with respect to these variables. The methodology is able to discover signatures for laboratory facility and culture medium that are largely independent of the signature of laboratory adaptation. Going beyond our previous laboratory evolution study, this work suggests that proteomic differences between laboratory-adapted and wild Y. pestis are general, potentially pointing to a process that could apply to other species as well. Additionally, we show that proteomics datasets (even archived data collected for different purposes) contain the information necessary to distinguish wild and laboratory samples. This work has clear applications in biomarker detection as well as biodefense.

AB - The rapid pace of bacterial evolution enables organisms to adapt to the laboratory environment with repeated passage and thus diverge from naturally-occurring environmental (“wild”) strains. Distinguishing wild and laboratory strains is clearly important for biodefense and bioforensics; however, DNA sequence data alone has thus far not provided a clear signature, perhaps due to lack of understanding of how diverse genome changes lead to convergent phenotypes, difficulty in detecting certain types of mutations, or perhaps because some adaptive modifications are epigenetic. Monitoring protein abundance, a molecular measure of phenotype, can overcome some of these difficulties. We have assembled a collection of Yersinia pestis proteomics datasets from our own published and unpublished work, and from a proteomics data archive, and demonstrated that protein abundance data can clearly distinguish laboratory-adapted from wild. We developed a lasso logistic regression classifier that uses binary (presence/absence) or quantitative protein abundance measures to predict whether a sample is laboratory-adapted or wild that proved to be ~98% accurate, as judged by replicated 10-fold cross-validation. Protein features selected by the classifier accord well with our previous study of laboratory adaptation in Y. pestis. The input data was derived from a variety of unrelated experiments and contained significant confounding variables. We show that the classifier is robust with respect to these variables. The methodology is able to discover signatures for laboratory facility and culture medium that are largely independent of the signature of laboratory adaptation. Going beyond our previous laboratory evolution study, this work suggests that proteomic differences between laboratory-adapted and wild Y. pestis are general, potentially pointing to a process that could apply to other species as well. Additionally, we show that proteomics datasets (even archived data collected for different purposes) contain the information necessary to distinguish wild and laboratory samples. This work has clear applications in biomarker detection as well as biodefense.

UR - http://www.scopus.com/inward/record.url?scp=85028524864&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85028524864&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0183478

DO - 10.1371/journal.pone.0183478

M3 - Article

C2 - 28854255

AN - SCOPUS:85028524864

VL - 12

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 8

M1 - e0183478

ER -