Developing a bottom-up, user-based method of web register classification

Jesse Egbert, Douglas E Biber, Mark Davies

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.

Original languageEnglish (US)
Pages (from-to)1817-1831
Number of pages15
JournalJournal of the Association for Information Science and Technology
Volume66
Issue number9
DOIs
StatePublished - Sep 1 2015

Fingerprint

Decision trees
Internet
Testing
Costs
Bottom-up
World Wide Web
costs

Keywords

  • classification
  • discourse analysis
  • linguistic analysis

ASJC Scopus subject areas

  • Information Systems and Management
  • Library and Information Sciences
  • Computer Networks and Communications
  • Information Systems

Cite this

Developing a bottom-up, user-based method of web register classification. / Egbert, Jesse; Biber, Douglas E; Davies, Mark.

In: Journal of the Association for Information Science and Technology, Vol. 66, No. 9, 01.09.2015, p. 1817-1831.

Research output: Contribution to journalArticle

@article{99961b1f40484631b7404f385f602fe7,
title = "Developing a bottom-up, user-based method of web register classification",
abstract = "This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.",
keywords = "classification, discourse analysis, linguistic analysis",
author = "Jesse Egbert and Biber, {Douglas E} and Mark Davies",
year = "2015",
month = "9",
day = "1",
doi = "10.1002/asi.23308",
language = "English (US)",
volume = "66",
pages = "1817--1831",
journal = "Journal of the Association for Information Science and Technology",
issn = "2330-1635",
publisher = "John Wiley and Sons Ltd",
number = "9",

}

TY - JOUR

T1 - Developing a bottom-up, user-based method of web register classification

AU - Egbert, Jesse

AU - Biber, Douglas E

AU - Davies, Mark

PY - 2015/9/1

Y1 - 2015/9/1

N2 - This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.

AB - This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.

KW - classification

KW - discourse analysis

KW - linguistic analysis

UR - http://www.scopus.com/inward/record.url?scp=84927779026&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84927779026&partnerID=8YFLogxK

U2 - 10.1002/asi.23308

DO - 10.1002/asi.23308

M3 - Article

VL - 66

SP - 1817

EP - 1831

JO - Journal of the Association for Information Science and Technology

JF - Journal of the Association for Information Science and Technology

SN - 2330-1635

IS - 9

ER -