Incorporating text dispersion into keyword analyses

Jesse Egbert, Douglas E Biber

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Keyword analysis has become an indispensable tool for discourse analysts, being applied to identify the words that are especially characteristic of the texts in a target discourse domain. But, surprisingly, the statistical computation of keyness makes no reference to those texts. Rather, once a corpus has been constructed, it is treated as a homogeneous whole for the computation of keyness. As a result, the keywords in such lists are relatively frequent in the corpus, but they are often not widely dispersed across the texts of that corpus and are thus not truly representative of the target discourse domain. The purpose of this study is to propose a new method for keyword analysis - text dispersion keyness - that is based on text dispersion, rather than corpus frequency. We compare the effectiveness of this measure to four other methods for computing keyness, carrying out a series of case studies to identify the keywords that are typical of online travel blogs. A variety of quantitative and qualitative analyses are carried out to compare these methods based on their content-generalisability and content-distinctiveness, demonstrating that text dispersion keyness is a superior measure for generating keyword lists.

Original languageEnglish (US)
Pages (from-to)77-104
Number of pages28
JournalCorpora
Volume14
Issue number1
DOIs
StatePublished - Jan 1 2018

Fingerprint

discourse
text analysis
weblog
travel
Key Words
Discourse
Text Analysis
Blogs
Distinctiveness

Keywords

  • Analysis
  • Distinctiveness
  • Generalisability
  • Lexical dispersion
  • Word importance

ASJC Scopus subject areas

  • Language and Linguistics
  • Visual Arts and Performing Arts
  • Music
  • Linguistics and Language

Cite this

Incorporating text dispersion into keyword analyses. / Egbert, Jesse; Biber, Douglas E.

In: Corpora, Vol. 14, No. 1, 01.01.2018, p. 77-104.

Research output: Contribution to journalArticle

Egbert, Jesse ; Biber, Douglas E. / Incorporating text dispersion into keyword analyses. In: Corpora. 2018 ; Vol. 14, No. 1. pp. 77-104.
@article{a22f4781386a4c6690433b97560133d4,
title = "Incorporating text dispersion into keyword analyses",
abstract = "Keyword analysis has become an indispensable tool for discourse analysts, being applied to identify the words that are especially characteristic of the texts in a target discourse domain. But, surprisingly, the statistical computation of keyness makes no reference to those texts. Rather, once a corpus has been constructed, it is treated as a homogeneous whole for the computation of keyness. As a result, the keywords in such lists are relatively frequent in the corpus, but they are often not widely dispersed across the texts of that corpus and are thus not truly representative of the target discourse domain. The purpose of this study is to propose a new method for keyword analysis - text dispersion keyness - that is based on text dispersion, rather than corpus frequency. We compare the effectiveness of this measure to four other methods for computing keyness, carrying out a series of case studies to identify the keywords that are typical of online travel blogs. A variety of quantitative and qualitative analyses are carried out to compare these methods based on their content-generalisability and content-distinctiveness, demonstrating that text dispersion keyness is a superior measure for generating keyword lists.",
keywords = "Analysis, Distinctiveness, Generalisability, Lexical dispersion, Word importance",
author = "Jesse Egbert and Biber, {Douglas E}",
year = "2018",
month = "1",
day = "1",
doi = "10.3366/cor.2019.0162",
language = "English (US)",
volume = "14",
pages = "77--104",
journal = "Cuadernos de Musica, Artes Visuales y Artes Escenicas",
issn = "1794-6670",
publisher = "Pontificia Universidad Javeriana",
number = "1",

}

TY - JOUR

T1 - Incorporating text dispersion into keyword analyses

AU - Egbert, Jesse

AU - Biber, Douglas E

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Keyword analysis has become an indispensable tool for discourse analysts, being applied to identify the words that are especially characteristic of the texts in a target discourse domain. But, surprisingly, the statistical computation of keyness makes no reference to those texts. Rather, once a corpus has been constructed, it is treated as a homogeneous whole for the computation of keyness. As a result, the keywords in such lists are relatively frequent in the corpus, but they are often not widely dispersed across the texts of that corpus and are thus not truly representative of the target discourse domain. The purpose of this study is to propose a new method for keyword analysis - text dispersion keyness - that is based on text dispersion, rather than corpus frequency. We compare the effectiveness of this measure to four other methods for computing keyness, carrying out a series of case studies to identify the keywords that are typical of online travel blogs. A variety of quantitative and qualitative analyses are carried out to compare these methods based on their content-generalisability and content-distinctiveness, demonstrating that text dispersion keyness is a superior measure for generating keyword lists.

AB - Keyword analysis has become an indispensable tool for discourse analysts, being applied to identify the words that are especially characteristic of the texts in a target discourse domain. But, surprisingly, the statistical computation of keyness makes no reference to those texts. Rather, once a corpus has been constructed, it is treated as a homogeneous whole for the computation of keyness. As a result, the keywords in such lists are relatively frequent in the corpus, but they are often not widely dispersed across the texts of that corpus and are thus not truly representative of the target discourse domain. The purpose of this study is to propose a new method for keyword analysis - text dispersion keyness - that is based on text dispersion, rather than corpus frequency. We compare the effectiveness of this measure to four other methods for computing keyness, carrying out a series of case studies to identify the keywords that are typical of online travel blogs. A variety of quantitative and qualitative analyses are carried out to compare these methods based on their content-generalisability and content-distinctiveness, demonstrating that text dispersion keyness is a superior measure for generating keyword lists.

KW - Analysis

KW - Distinctiveness

KW - Generalisability

KW - Lexical dispersion

KW - Word importance

UR - http://www.scopus.com/inward/record.url?scp=85068500025&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85068500025&partnerID=8YFLogxK

U2 - 10.3366/cor.2019.0162

DO - 10.3366/cor.2019.0162

M3 - Article

AN - SCOPUS:85068500025

VL - 14

SP - 77

EP - 104

JO - Cuadernos de Musica, Artes Visuales y Artes Escenicas

JF - Cuadernos de Musica, Artes Visuales y Artes Escenicas

SN - 1794-6670

IS - 1

ER -