Rapid pattern development for concept recognition systems

Application to point mutations

James G Caporaso, William A. Baumgartner, David A. Randolph, K. Bretonnel Cohen, Lawrence Hunter

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

The primary biomedical literature is being generated at an unprecedented rate, and researchers cannot keep abreast of new developments in their fields. Biomedical natural language processing is being developed to address this issue, but building reliable systems often requires many expert-hours. We present an approach for automatically developing collections of regular expressions to drive high-performance concept recognition systems with minimal human interaction. We applied our approach to develop MutationFinder, a system for automatically extracting mentions of point mutations from the text. MutationFinder achieves performance equivalent to or better than manually developed mutation recognition systems, but the generation of its 759 patterns has required only 5.5 expert-hours. We also discuss the development and evaluation of our recently published high-quality, human-annotated gold standard corpus, which contains 1,515 complete point mutation mentions annotated in 813 abstracts. Both MutationFinder and the complete corpus are publicly available at http://mutationfinder.sourceforge.net/.

Original languageEnglish (US)
Pages (from-to)1233-1259
Number of pages27
JournalJournal of Bioinformatics and Computational Biology
Volume5
Issue number6
DOIs
StatePublished - Dec 2007
Externally publishedYes

Fingerprint

Point Mutation
Natural Language Processing
Processing
Research Personnel
Mutation

Keywords

  • Biomedical natural language processing
  • Concept recognition
  • Corpus construction
  • Information extraction
  • Mutations
  • Pattern learning
  • Text mining

ASJC Scopus subject areas

  • Medicine(all)
  • Cell Biology

Cite this

Rapid pattern development for concept recognition systems : Application to point mutations. / Caporaso, James G; Baumgartner, William A.; Randolph, David A.; Cohen, K. Bretonnel; Hunter, Lawrence.

In: Journal of Bioinformatics and Computational Biology, Vol. 5, No. 6, 12.2007, p. 1233-1259.

Research output: Contribution to journalArticle

Caporaso, James G ; Baumgartner, William A. ; Randolph, David A. ; Cohen, K. Bretonnel ; Hunter, Lawrence. / Rapid pattern development for concept recognition systems : Application to point mutations. In: Journal of Bioinformatics and Computational Biology. 2007 ; Vol. 5, No. 6. pp. 1233-1259.
@article{d800fc2deea046dc991626713546e6d7,
title = "Rapid pattern development for concept recognition systems: Application to point mutations",
abstract = "The primary biomedical literature is being generated at an unprecedented rate, and researchers cannot keep abreast of new developments in their fields. Biomedical natural language processing is being developed to address this issue, but building reliable systems often requires many expert-hours. We present an approach for automatically developing collections of regular expressions to drive high-performance concept recognition systems with minimal human interaction. We applied our approach to develop MutationFinder, a system for automatically extracting mentions of point mutations from the text. MutationFinder achieves performance equivalent to or better than manually developed mutation recognition systems, but the generation of its 759 patterns has required only 5.5 expert-hours. We also discuss the development and evaluation of our recently published high-quality, human-annotated gold standard corpus, which contains 1,515 complete point mutation mentions annotated in 813 abstracts. Both MutationFinder and the complete corpus are publicly available at http://mutationfinder.sourceforge.net/.",
keywords = "Biomedical natural language processing, Concept recognition, Corpus construction, Information extraction, Mutations, Pattern learning, Text mining",
author = "Caporaso, {James G} and Baumgartner, {William A.} and Randolph, {David A.} and Cohen, {K. Bretonnel} and Lawrence Hunter",
year = "2007",
month = "12",
doi = "10.1142/S0219720007003144",
language = "English (US)",
volume = "5",
pages = "1233--1259",
journal = "Journal of Bioinformatics and Computational Biology",
issn = "0219-7200",
publisher = "World Scientific Publishing Co. Pte Ltd",
number = "6",

}

TY - JOUR

T1 - Rapid pattern development for concept recognition systems

T2 - Application to point mutations

AU - Caporaso, James G

AU - Baumgartner, William A.

AU - Randolph, David A.

AU - Cohen, K. Bretonnel

AU - Hunter, Lawrence

PY - 2007/12

Y1 - 2007/12

N2 - The primary biomedical literature is being generated at an unprecedented rate, and researchers cannot keep abreast of new developments in their fields. Biomedical natural language processing is being developed to address this issue, but building reliable systems often requires many expert-hours. We present an approach for automatically developing collections of regular expressions to drive high-performance concept recognition systems with minimal human interaction. We applied our approach to develop MutationFinder, a system for automatically extracting mentions of point mutations from the text. MutationFinder achieves performance equivalent to or better than manually developed mutation recognition systems, but the generation of its 759 patterns has required only 5.5 expert-hours. We also discuss the development and evaluation of our recently published high-quality, human-annotated gold standard corpus, which contains 1,515 complete point mutation mentions annotated in 813 abstracts. Both MutationFinder and the complete corpus are publicly available at http://mutationfinder.sourceforge.net/.

AB - The primary biomedical literature is being generated at an unprecedented rate, and researchers cannot keep abreast of new developments in their fields. Biomedical natural language processing is being developed to address this issue, but building reliable systems often requires many expert-hours. We present an approach for automatically developing collections of regular expressions to drive high-performance concept recognition systems with minimal human interaction. We applied our approach to develop MutationFinder, a system for automatically extracting mentions of point mutations from the text. MutationFinder achieves performance equivalent to or better than manually developed mutation recognition systems, but the generation of its 759 patterns has required only 5.5 expert-hours. We also discuss the development and evaluation of our recently published high-quality, human-annotated gold standard corpus, which contains 1,515 complete point mutation mentions annotated in 813 abstracts. Both MutationFinder and the complete corpus are publicly available at http://mutationfinder.sourceforge.net/.

KW - Biomedical natural language processing

KW - Concept recognition

KW - Corpus construction

KW - Information extraction

KW - Mutations

KW - Pattern learning

KW - Text mining

UR - http://www.scopus.com/inward/record.url?scp=37849006444&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=37849006444&partnerID=8YFLogxK

U2 - 10.1142/S0219720007003144

DO - 10.1142/S0219720007003144

M3 - Article

VL - 5

SP - 1233

EP - 1259

JO - Journal of Bioinformatics and Computational Biology

JF - Journal of Bioinformatics and Computational Biology

SN - 0219-7200

IS - 6

ER -