Improved automatic English proficiency rating of unconstrained speech with multiple corpora

David O. Johnson, Okim Kang, Romy Ghanem

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

The performance of machine learning classifiers in automatically scoring the English proficiency of unconstrained speech has been explored. Suprasegmental measures were computed by software, which identifies the basic elements of Brazil’s model in human discourse. This paper explores machine learning training with multiple corpora to improve two of those algorithms: prominent syllable detection and tone choice classification. The results show that machine learning training with the Boston University Radio News Corpus can improve automatic English proficiency scoring of unconstrained speech from a Pearson’s correlation of 0.677–0.718. This correlation is higher than any other existing computer programs for automatically scoring the proficiency of unconstrained speech and is approaching that of human raters in terms of inter-rater reliability.

Original languageEnglish (US)
Pages (from-to)1-14
Number of pages14
JournalInternational Journal of Speech Technology
DOIs
StateAccepted/In press - Sep 19 2016

Fingerprint

Learning systems
rating
learning
data processing program
Computer program listings
radio
Classifiers
news
discourse
performance
Machine Learning
Rating
Scoring
English Proficiency
software
Classifier
News
Interrater Reliability
Suprasegmentals
Proficiency

Keywords

  • Automated proficiency scoring
  • Boston University Radio News Corpus
  • Brazil’s prosody model
  • Genetic algorithm feature selection
  • Multiple corpora training
  • Suprasegmental measures

ASJC Scopus subject areas

  • Software
  • Language and Linguistics
  • Human-Computer Interaction
  • Linguistics and Language
  • Computer Vision and Pattern Recognition

Cite this

Improved automatic English proficiency rating of unconstrained speech with multiple corpora. / Johnson, David O.; Kang, Okim; Ghanem, Romy.

In: International Journal of Speech Technology, 19.09.2016, p. 1-14.

Research output: Contribution to journalArticle

@article{d7c8e471127e40478d94f3c047ab9ed9,
title = "Improved automatic English proficiency rating of unconstrained speech with multiple corpora",
abstract = "The performance of machine learning classifiers in automatically scoring the English proficiency of unconstrained speech has been explored. Suprasegmental measures were computed by software, which identifies the basic elements of Brazil’s model in human discourse. This paper explores machine learning training with multiple corpora to improve two of those algorithms: prominent syllable detection and tone choice classification. The results show that machine learning training with the Boston University Radio News Corpus can improve automatic English proficiency scoring of unconstrained speech from a Pearson’s correlation of 0.677–0.718. This correlation is higher than any other existing computer programs for automatically scoring the proficiency of unconstrained speech and is approaching that of human raters in terms of inter-rater reliability.",
keywords = "Automated proficiency scoring, Boston University Radio News Corpus, Brazil’s prosody model, Genetic algorithm feature selection, Multiple corpora training, Suprasegmental measures",
author = "Johnson, {David O.} and Okim Kang and Romy Ghanem",
year = "2016",
month = "9",
day = "19",
doi = "10.1007/s10772-016-9366-0",
language = "English (US)",
pages = "1--14",
journal = "International Journal of Speech Technology",
issn = "1381-2416",
publisher = "Springer Netherlands",

}

TY - JOUR

T1 - Improved automatic English proficiency rating of unconstrained speech with multiple corpora

AU - Johnson, David O.

AU - Kang, Okim

AU - Ghanem, Romy

PY - 2016/9/19

Y1 - 2016/9/19

N2 - The performance of machine learning classifiers in automatically scoring the English proficiency of unconstrained speech has been explored. Suprasegmental measures were computed by software, which identifies the basic elements of Brazil’s model in human discourse. This paper explores machine learning training with multiple corpora to improve two of those algorithms: prominent syllable detection and tone choice classification. The results show that machine learning training with the Boston University Radio News Corpus can improve automatic English proficiency scoring of unconstrained speech from a Pearson’s correlation of 0.677–0.718. This correlation is higher than any other existing computer programs for automatically scoring the proficiency of unconstrained speech and is approaching that of human raters in terms of inter-rater reliability.

AB - The performance of machine learning classifiers in automatically scoring the English proficiency of unconstrained speech has been explored. Suprasegmental measures were computed by software, which identifies the basic elements of Brazil’s model in human discourse. This paper explores machine learning training with multiple corpora to improve two of those algorithms: prominent syllable detection and tone choice classification. The results show that machine learning training with the Boston University Radio News Corpus can improve automatic English proficiency scoring of unconstrained speech from a Pearson’s correlation of 0.677–0.718. This correlation is higher than any other existing computer programs for automatically scoring the proficiency of unconstrained speech and is approaching that of human raters in terms of inter-rater reliability.

KW - Automated proficiency scoring

KW - Boston University Radio News Corpus

KW - Brazil’s prosody model

KW - Genetic algorithm feature selection

KW - Multiple corpora training

KW - Suprasegmental measures

UR - http://www.scopus.com/inward/record.url?scp=84988345743&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84988345743&partnerID=8YFLogxK

U2 - 10.1007/s10772-016-9366-0

DO - 10.1007/s10772-016-9366-0

M3 - Article

SP - 1

EP - 14

JO - International Journal of Speech Technology

JF - International Journal of Speech Technology

SN - 1381-2416

ER -