Developing a bottom-up, user-based method of web register classification

Jesse Egbert, Douglas Biber, Mark Davies

Research output: Contribution to journalArticle

16 Scopus citations

Abstract

This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.

Original languageEnglish (US)
Pages (from-to)1817-1831
Number of pages15
JournalJournal of the Association for Information Science and Technology
Volume66
Issue number9
DOIs
StatePublished - Sep 1 2015

Keywords

  • classification
  • discourse analysis
  • linguistic analysis

ASJC Scopus subject areas

  • Information Systems
  • Computer Networks and Communications
  • Information Systems and Management
  • Library and Information Sciences

Fingerprint Dive into the research topics of 'Developing a bottom-up, user-based method of web register classification'. Together they form a unique fingerprint.

  • Cite this