The American national corpus: More than the web can provide

Nancy Ide, Randi Reppen, Keith Suderman

Research output: Contribution to conferencePaper

18 Scopus citations

Abstract

The American National Corpus (ANC) project is developing a corpus comparable to the British National Corpus (BNC), covering American English. Recent interest in the web as a source of corpus materials has caused some in the language processing community to suggest that the development of a corpus of American English is unnecessary. However, we argue that far from being rendered superfluous by the availability of web materials, the ANC is likely to provide a resource for developing web acquisition techniques to support tasks such as genre and language detection and automatic annotation. This paper presents a comparison of the ANC in terms of both content and format with a test corpus compiled from web data, and a discussion of points of intersection and divergence.

Original languageEnglish (US)
Pages839-844
Number of pages6
StatePublished - Jan 1 2002
Event3rd International Conference on Language Resources and Evaluation, LREC 2002 - Las Palmas, Canary Islands, Spain
Duration: May 29 2002May 31 2002

Other

Other3rd International Conference on Language Resources and Evaluation, LREC 2002
CountrySpain
CityLas Palmas, Canary Islands
Period5/29/025/31/02

ASJC Scopus subject areas

  • Linguistics and Language
  • Language and Linguistics
  • Education
  • Library and Information Sciences

Fingerprint Dive into the research topics of 'The American national corpus: More than the web can provide'. Together they form a unique fingerprint.

  • Cite this

    Ide, N., Reppen, R., & Suderman, K. (2002). The American national corpus: More than the web can provide. 839-844. Paper presented at 3rd International Conference on Language Resources and Evaluation, LREC 2002, Las Palmas, Canary Islands, Spain.