Evaluating reliability in quantitative vocabulary studies: The influence of corpus design and composition

Don Miller, Douglas E Biber

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Recent methodological advances have been used to create word lists based on large corpora. The present paper explores whether these corpora - and the associated lists - are unequivocally more representative. Corpus design considerations have usually focused on issues of external representativeness (representing the target discourse domain), while disregarding issues of internal representativeness (whether the corpus permits reliable descriptions of linguistic variation). This disregard may be especially problematic for studies of lexical variation, where it is difficult to achieve stable, reliable results from corpus analysis. The present paper illustrates these challenges through experiments based on analysis of a corpus representing a highly restricted discourse domain: university-level introductory psychology textbooks. The results indicate that corpus design and composition has a much greater influence on lexical variation than previously recognized, highlighting the need to evaluate internal representativeness in quantitative corpus-based research.

Original languageEnglish (US)
Pages (from-to)30-53
Number of pages24
JournalInternational Journal of Corpus Linguistics
Volume20
Issue number1
DOIs
StatePublished - 2015

Fingerprint

vocabulary
discourse
textbook
psychology
linguistics
university
experiment
Vocabulary
Representativeness
Discourse

Keywords

  • Corpus representativeness
  • Lexical diversity and variability
  • Reliability and validity
  • Word lists

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this

@article{961ff226c7ff497ba48e64104d8c5e37,
title = "Evaluating reliability in quantitative vocabulary studies: The influence of corpus design and composition",
abstract = "Recent methodological advances have been used to create word lists based on large corpora. The present paper explores whether these corpora - and the associated lists - are unequivocally more representative. Corpus design considerations have usually focused on issues of external representativeness (representing the target discourse domain), while disregarding issues of internal representativeness (whether the corpus permits reliable descriptions of linguistic variation). This disregard may be especially problematic for studies of lexical variation, where it is difficult to achieve stable, reliable results from corpus analysis. The present paper illustrates these challenges through experiments based on analysis of a corpus representing a highly restricted discourse domain: university-level introductory psychology textbooks. The results indicate that corpus design and composition has a much greater influence on lexical variation than previously recognized, highlighting the need to evaluate internal representativeness in quantitative corpus-based research.",
keywords = "Corpus representativeness, Lexical diversity and variability, Reliability and validity, Word lists",
author = "Don Miller and Biber, {Douglas E}",
year = "2015",
doi = "10.1075/ijcl.20.1.02mil",
language = "English (US)",
volume = "20",
pages = "30--53",
journal = "International Journal of Corpus Linguistics",
issn = "1384-6655",
publisher = "John Benjamins Publishing Company",
number = "1",

}

TY - JOUR

T1 - Evaluating reliability in quantitative vocabulary studies

T2 - The influence of corpus design and composition

AU - Miller, Don

AU - Biber, Douglas E

PY - 2015

Y1 - 2015

N2 - Recent methodological advances have been used to create word lists based on large corpora. The present paper explores whether these corpora - and the associated lists - are unequivocally more representative. Corpus design considerations have usually focused on issues of external representativeness (representing the target discourse domain), while disregarding issues of internal representativeness (whether the corpus permits reliable descriptions of linguistic variation). This disregard may be especially problematic for studies of lexical variation, where it is difficult to achieve stable, reliable results from corpus analysis. The present paper illustrates these challenges through experiments based on analysis of a corpus representing a highly restricted discourse domain: university-level introductory psychology textbooks. The results indicate that corpus design and composition has a much greater influence on lexical variation than previously recognized, highlighting the need to evaluate internal representativeness in quantitative corpus-based research.

AB - Recent methodological advances have been used to create word lists based on large corpora. The present paper explores whether these corpora - and the associated lists - are unequivocally more representative. Corpus design considerations have usually focused on issues of external representativeness (representing the target discourse domain), while disregarding issues of internal representativeness (whether the corpus permits reliable descriptions of linguistic variation). This disregard may be especially problematic for studies of lexical variation, where it is difficult to achieve stable, reliable results from corpus analysis. The present paper illustrates these challenges through experiments based on analysis of a corpus representing a highly restricted discourse domain: university-level introductory psychology textbooks. The results indicate that corpus design and composition has a much greater influence on lexical variation than previously recognized, highlighting the need to evaluate internal representativeness in quantitative corpus-based research.

KW - Corpus representativeness

KW - Lexical diversity and variability

KW - Reliability and validity

KW - Word lists

UR - http://www.scopus.com/inward/record.url?scp=84926471509&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84926471509&partnerID=8YFLogxK

U2 - 10.1075/ijcl.20.1.02mil

DO - 10.1075/ijcl.20.1.02mil

M3 - Article

AN - SCOPUS:84926471509

VL - 20

SP - 30

EP - 53

JO - International Journal of Corpus Linguistics

JF - International Journal of Corpus Linguistics

SN - 1384-6655

IS - 1

ER -