On the (non)utility of Juilland's D to measure lexical dispersion in large corpora

Douglas E Biber, Randi Reppen, Erin Schnur, Romy Ghanem

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

This paper explores the effectiveness of Juilland's D as a measure of vocabulary dispersion in large corpora. Through a series of experiments using the BNC, we explored the influence of three variables: the number of corpus-parts used for the computation of D, the frequency of the target word, and the distributions of those words. The experiments demonstrate that the effective range for D is greatly reduced when computations are based on a large number of corpus-parts: even words with highly skewed distributions have D values indicating a relatively uniform distribution. We also briefly explore an alternative measure, Gries- DP (Gries 2008), showing that it is a more reliable and effective measure of dispersion in a large corpus divided into many parts. In conclusion, we discuss the implications of these findings for quantitative methods applied to the creation of vocabulary lists as well as research questions in other areas of corpus linguistics.

Original languageEnglish (US)
Pages (from-to)439-464
Number of pages26
JournalInternational Journal of Corpus Linguistics
Volume21
Issue number4
DOIs
StatePublished - 2016

Fingerprint

vocabulary
experiment
quantitative method
linguistics
Values
Vocabulary
Experiment
Corpus Linguistics
Quantitative Methods

Keywords

  • Dispersion measures
  • Gries- DP
  • Juilland's D
  • Quantitative methods for corpus analysis
  • Vocabulary lists

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this

On the (non)utility of Juilland's D to measure lexical dispersion in large corpora. / Biber, Douglas E; Reppen, Randi; Schnur, Erin; Ghanem, Romy.

In: International Journal of Corpus Linguistics, Vol. 21, No. 4, 2016, p. 439-464.

Research output: Contribution to journalArticle

@article{f5ff7a0796b24ce3b6c4a1d5a2a4a50f,
title = "On the (non)utility of Juilland's D to measure lexical dispersion in large corpora",
abstract = "This paper explores the effectiveness of Juilland's D as a measure of vocabulary dispersion in large corpora. Through a series of experiments using the BNC, we explored the influence of three variables: the number of corpus-parts used for the computation of D, the frequency of the target word, and the distributions of those words. The experiments demonstrate that the effective range for D is greatly reduced when computations are based on a large number of corpus-parts: even words with highly skewed distributions have D values indicating a relatively uniform distribution. We also briefly explore an alternative measure, Gries- DP (Gries 2008), showing that it is a more reliable and effective measure of dispersion in a large corpus divided into many parts. In conclusion, we discuss the implications of these findings for quantitative methods applied to the creation of vocabulary lists as well as research questions in other areas of corpus linguistics.",
keywords = "Dispersion measures, Gries- DP, Juilland's D, Quantitative methods for corpus analysis, Vocabulary lists",
author = "Biber, {Douglas E} and Randi Reppen and Erin Schnur and Romy Ghanem",
year = "2016",
doi = "10.1075/ijcl.21.4.01bib",
language = "English (US)",
volume = "21",
pages = "439--464",
journal = "International Journal of Corpus Linguistics",
issn = "1384-6655",
publisher = "John Benjamins Publishing Company",
number = "4",

}

TY - JOUR

T1 - On the (non)utility of Juilland's D to measure lexical dispersion in large corpora

AU - Biber, Douglas E

AU - Reppen, Randi

AU - Schnur, Erin

AU - Ghanem, Romy

PY - 2016

Y1 - 2016

N2 - This paper explores the effectiveness of Juilland's D as a measure of vocabulary dispersion in large corpora. Through a series of experiments using the BNC, we explored the influence of three variables: the number of corpus-parts used for the computation of D, the frequency of the target word, and the distributions of those words. The experiments demonstrate that the effective range for D is greatly reduced when computations are based on a large number of corpus-parts: even words with highly skewed distributions have D values indicating a relatively uniform distribution. We also briefly explore an alternative measure, Gries- DP (Gries 2008), showing that it is a more reliable and effective measure of dispersion in a large corpus divided into many parts. In conclusion, we discuss the implications of these findings for quantitative methods applied to the creation of vocabulary lists as well as research questions in other areas of corpus linguistics.

AB - This paper explores the effectiveness of Juilland's D as a measure of vocabulary dispersion in large corpora. Through a series of experiments using the BNC, we explored the influence of three variables: the number of corpus-parts used for the computation of D, the frequency of the target word, and the distributions of those words. The experiments demonstrate that the effective range for D is greatly reduced when computations are based on a large number of corpus-parts: even words with highly skewed distributions have D values indicating a relatively uniform distribution. We also briefly explore an alternative measure, Gries- DP (Gries 2008), showing that it is a more reliable and effective measure of dispersion in a large corpus divided into many parts. In conclusion, we discuss the implications of these findings for quantitative methods applied to the creation of vocabulary lists as well as research questions in other areas of corpus linguistics.

KW - Dispersion measures

KW - Gries- DP

KW - Juilland's D

KW - Quantitative methods for corpus analysis

KW - Vocabulary lists

UR - http://www.scopus.com/inward/record.url?scp=84999143234&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84999143234&partnerID=8YFLogxK

U2 - 10.1075/ijcl.21.4.01bib

DO - 10.1075/ijcl.21.4.01bib

M3 - Article

VL - 21

SP - 439

EP - 464

JO - International Journal of Corpus Linguistics

JF - International Journal of Corpus Linguistics

SN - 1384-6655

IS - 4

ER -