A compilation of soybean ESTs: Generation and analysis

Randy Shoemaker, Paul S Keim, Lila Vodkin, Ernest Retzel, Sandra W. Clifton, Robert Waterston, David Smoller, Virginia Coryell, Anupama Khanna, John Erpelding, Xiaowu Gai, Volker Brendel, Christina Raph-Schmidt, E. G. Shoop, C. J. Vielweber, Matt Schmatz, Deana Pape, Yvette Bowers, Brenda Theising, John MartinMichael Dante, Todd Wylie, Cheryl Granger

Research output: Contribution to journalArticle

101 Citations (Scopus)

Abstract

Whole-genome sequencing is fundamental to understanding the genetic composition of an organism. Given the size and complexity of the soybean genome, an alternative approach is targeted random-gene sequencing, which provides an immediate and productive method of gene discovery. In this study, more than 120 000 soybean expressed sequence tags (ESTs) generated from more than 50 cDNA libraries were evaluated. These ESTs coalesced into 16 928 contigs and 17 336 singletons. On average, each contig was composed of 6 ESTs and spanned 788 bases. The average sequence length submitted to dbEST was 414 bases. Using only those libraries generating more than 800 ESTs each and only those contigs with 10 or more ESTs each, correlated patterns of gene expression among libraries and genes were discerned. Two-dimensional qualitative representations of contig and library similarities were generated based on expression profiles. Genes with similar expression patterns and, potentially, similar functions were identified. These studies provide a rich source of publicly available gene sequences as well as valuable insight into the structure, function, and evolution of a model crop legume genome.

Original languageEnglish (US)
Pages (from-to)329-338
Number of pages10
JournalGenome
Volume45
Issue number2
DOIs
StatePublished - 2002
Externally publishedYes

Fingerprint

Expressed Sequence Tags
expressed sequence tags
Soybeans
soybeans
Genome
Gene Library
Libraries
genome
genes
Genes
crop models
Genetic Association Studies
Fabaceae
cDNA libraries
legumes
Gene Expression
nucleotide sequences
gene expression
organisms

Keywords

  • Functional genomics
  • Genome sequencing
  • Glycine max

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biotechnology
  • Genetics
  • Genetics(clinical)

Cite this

Shoemaker, R., Keim, P. S., Vodkin, L., Retzel, E., Clifton, S. W., Waterston, R., ... Granger, C. (2002). A compilation of soybean ESTs: Generation and analysis. Genome, 45(2), 329-338. https://doi.org/10.1139/g01-150

A compilation of soybean ESTs : Generation and analysis. / Shoemaker, Randy; Keim, Paul S; Vodkin, Lila; Retzel, Ernest; Clifton, Sandra W.; Waterston, Robert; Smoller, David; Coryell, Virginia; Khanna, Anupama; Erpelding, John; Gai, Xiaowu; Brendel, Volker; Raph-Schmidt, Christina; Shoop, E. G.; Vielweber, C. J.; Schmatz, Matt; Pape, Deana; Bowers, Yvette; Theising, Brenda; Martin, John; Dante, Michael; Wylie, Todd; Granger, Cheryl.

In: Genome, Vol. 45, No. 2, 2002, p. 329-338.

Research output: Contribution to journalArticle

Shoemaker, R, Keim, PS, Vodkin, L, Retzel, E, Clifton, SW, Waterston, R, Smoller, D, Coryell, V, Khanna, A, Erpelding, J, Gai, X, Brendel, V, Raph-Schmidt, C, Shoop, EG, Vielweber, CJ, Schmatz, M, Pape, D, Bowers, Y, Theising, B, Martin, J, Dante, M, Wylie, T & Granger, C 2002, 'A compilation of soybean ESTs: Generation and analysis', Genome, vol. 45, no. 2, pp. 329-338. https://doi.org/10.1139/g01-150
Shoemaker R, Keim PS, Vodkin L, Retzel E, Clifton SW, Waterston R et al. A compilation of soybean ESTs: Generation and analysis. Genome. 2002;45(2):329-338. https://doi.org/10.1139/g01-150
Shoemaker, Randy ; Keim, Paul S ; Vodkin, Lila ; Retzel, Ernest ; Clifton, Sandra W. ; Waterston, Robert ; Smoller, David ; Coryell, Virginia ; Khanna, Anupama ; Erpelding, John ; Gai, Xiaowu ; Brendel, Volker ; Raph-Schmidt, Christina ; Shoop, E. G. ; Vielweber, C. J. ; Schmatz, Matt ; Pape, Deana ; Bowers, Yvette ; Theising, Brenda ; Martin, John ; Dante, Michael ; Wylie, Todd ; Granger, Cheryl. / A compilation of soybean ESTs : Generation and analysis. In: Genome. 2002 ; Vol. 45, No. 2. pp. 329-338.
@article{2a2ac5682e5a4b1582e7b8162816808a,
title = "A compilation of soybean ESTs: Generation and analysis",
abstract = "Whole-genome sequencing is fundamental to understanding the genetic composition of an organism. Given the size and complexity of the soybean genome, an alternative approach is targeted random-gene sequencing, which provides an immediate and productive method of gene discovery. In this study, more than 120 000 soybean expressed sequence tags (ESTs) generated from more than 50 cDNA libraries were evaluated. These ESTs coalesced into 16 928 contigs and 17 336 singletons. On average, each contig was composed of 6 ESTs and spanned 788 bases. The average sequence length submitted to dbEST was 414 bases. Using only those libraries generating more than 800 ESTs each and only those contigs with 10 or more ESTs each, correlated patterns of gene expression among libraries and genes were discerned. Two-dimensional qualitative representations of contig and library similarities were generated based on expression profiles. Genes with similar expression patterns and, potentially, similar functions were identified. These studies provide a rich source of publicly available gene sequences as well as valuable insight into the structure, function, and evolution of a model crop legume genome.",
keywords = "Functional genomics, Genome sequencing, Glycine max",
author = "Randy Shoemaker and Keim, {Paul S} and Lila Vodkin and Ernest Retzel and Clifton, {Sandra W.} and Robert Waterston and David Smoller and Virginia Coryell and Anupama Khanna and John Erpelding and Xiaowu Gai and Volker Brendel and Christina Raph-Schmidt and Shoop, {E. G.} and Vielweber, {C. J.} and Matt Schmatz and Deana Pape and Yvette Bowers and Brenda Theising and John Martin and Michael Dante and Todd Wylie and Cheryl Granger",
year = "2002",
doi = "10.1139/g01-150",
language = "English (US)",
volume = "45",
pages = "329--338",
journal = "Genome / National Research Council Canada = Genome / Conseil national de recherches Canada",
issn = "0831-2796",
publisher = "National Research Council of Canada",
number = "2",

}

TY - JOUR

T1 - A compilation of soybean ESTs

T2 - Generation and analysis

AU - Shoemaker, Randy

AU - Keim, Paul S

AU - Vodkin, Lila

AU - Retzel, Ernest

AU - Clifton, Sandra W.

AU - Waterston, Robert

AU - Smoller, David

AU - Coryell, Virginia

AU - Khanna, Anupama

AU - Erpelding, John

AU - Gai, Xiaowu

AU - Brendel, Volker

AU - Raph-Schmidt, Christina

AU - Shoop, E. G.

AU - Vielweber, C. J.

AU - Schmatz, Matt

AU - Pape, Deana

AU - Bowers, Yvette

AU - Theising, Brenda

AU - Martin, John

AU - Dante, Michael

AU - Wylie, Todd

AU - Granger, Cheryl

PY - 2002

Y1 - 2002

N2 - Whole-genome sequencing is fundamental to understanding the genetic composition of an organism. Given the size and complexity of the soybean genome, an alternative approach is targeted random-gene sequencing, which provides an immediate and productive method of gene discovery. In this study, more than 120 000 soybean expressed sequence tags (ESTs) generated from more than 50 cDNA libraries were evaluated. These ESTs coalesced into 16 928 contigs and 17 336 singletons. On average, each contig was composed of 6 ESTs and spanned 788 bases. The average sequence length submitted to dbEST was 414 bases. Using only those libraries generating more than 800 ESTs each and only those contigs with 10 or more ESTs each, correlated patterns of gene expression among libraries and genes were discerned. Two-dimensional qualitative representations of contig and library similarities were generated based on expression profiles. Genes with similar expression patterns and, potentially, similar functions were identified. These studies provide a rich source of publicly available gene sequences as well as valuable insight into the structure, function, and evolution of a model crop legume genome.

AB - Whole-genome sequencing is fundamental to understanding the genetic composition of an organism. Given the size and complexity of the soybean genome, an alternative approach is targeted random-gene sequencing, which provides an immediate and productive method of gene discovery. In this study, more than 120 000 soybean expressed sequence tags (ESTs) generated from more than 50 cDNA libraries were evaluated. These ESTs coalesced into 16 928 contigs and 17 336 singletons. On average, each contig was composed of 6 ESTs and spanned 788 bases. The average sequence length submitted to dbEST was 414 bases. Using only those libraries generating more than 800 ESTs each and only those contigs with 10 or more ESTs each, correlated patterns of gene expression among libraries and genes were discerned. Two-dimensional qualitative representations of contig and library similarities were generated based on expression profiles. Genes with similar expression patterns and, potentially, similar functions were identified. These studies provide a rich source of publicly available gene sequences as well as valuable insight into the structure, function, and evolution of a model crop legume genome.

KW - Functional genomics

KW - Genome sequencing

KW - Glycine max

UR - http://www.scopus.com/inward/record.url?scp=0036009442&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036009442&partnerID=8YFLogxK

U2 - 10.1139/g01-150

DO - 10.1139/g01-150

M3 - Article

C2 - 11962630

AN - SCOPUS:0036009442

VL - 45

SP - 329

EP - 338

JO - Genome / National Research Council Canada = Genome / Conseil national de recherches Canada

JF - Genome / National Research Council Canada = Genome / Conseil national de recherches Canada

SN - 0831-2796

IS - 2

ER -