Comparison of algorithms to divide noisy phone sequences into syllables for automatic unconstrained English speaking proficiency scoring

David O. Johnson, Okim Kang

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Four algorithms for syllabifying phones are compared in automatically scoring English oral proficiency. The first algorithm clusters consonants into groups with the vowel nearer to them temporally, taking into account the maximal onset principle. A Hidden Markov Model (HMM) predicts the syllable boundaries based on their sonority value in the second algorithm. The third one employs three HMMs which are tuned to specific categories of utterances. The final algorithm uses a genetic algorithm to identify a set of rules for syllabifying the phones. They were evaluated by: (1) how well they syllabified utterances from the Boston University Radio News Corpus (BURNC) and (2) how well they worked as part of a process to automatically score English speaking proficiency. A measure of the temporal alignment of the syllables was utilized to judge how satisfactorily they syllabified utterances. Their suitability in the proficiency process was assessed with the Pearson correlation between the computer’s predicted proficiency scores and the scores determined by human examiners. We found that syllabification-by-genetic-algorithm performed the best in syllabifying the BURNC, but that syllabification-by-grouping (i.e., syllables are made by grouping non-syllabic consonant phones with the vowel or syllabic consonant phone nearest to them with respect to time) performed the best in the English oral proficiency rating application.

Original languageEnglish (US)
Pages (from-to)1-24
Number of pages24
JournalArtificial Intelligence Review
DOIs
StateAccepted/In press - Nov 22 2017

Fingerprint

speaking
Genetic algorithms
grouping
radio
news
Hidden Markov models
examiner
Phone
Proficiency
Scoring
rating
Utterance
Values
Group
Genetic Algorithm
News
Hidden Markov Model
Syllabification
Grouping

Keywords

  • ASR phone recognition
  • Automatic speaking proficiency scoring
  • Automatic syllabification
  • Maximal onset principle
  • Sonority sequencing principle

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language
  • Artificial Intelligence

Cite this

@article{59546a4d47d44973b67d7e34f4d50e15,
title = "Comparison of algorithms to divide noisy phone sequences into syllables for automatic unconstrained English speaking proficiency scoring",
abstract = "Four algorithms for syllabifying phones are compared in automatically scoring English oral proficiency. The first algorithm clusters consonants into groups with the vowel nearer to them temporally, taking into account the maximal onset principle. A Hidden Markov Model (HMM) predicts the syllable boundaries based on their sonority value in the second algorithm. The third one employs three HMMs which are tuned to specific categories of utterances. The final algorithm uses a genetic algorithm to identify a set of rules for syllabifying the phones. They were evaluated by: (1) how well they syllabified utterances from the Boston University Radio News Corpus (BURNC) and (2) how well they worked as part of a process to automatically score English speaking proficiency. A measure of the temporal alignment of the syllables was utilized to judge how satisfactorily they syllabified utterances. Their suitability in the proficiency process was assessed with the Pearson correlation between the computer’s predicted proficiency scores and the scores determined by human examiners. We found that syllabification-by-genetic-algorithm performed the best in syllabifying the BURNC, but that syllabification-by-grouping (i.e., syllables are made by grouping non-syllabic consonant phones with the vowel or syllabic consonant phone nearest to them with respect to time) performed the best in the English oral proficiency rating application.",
keywords = "ASR phone recognition, Automatic speaking proficiency scoring, Automatic syllabification, Maximal onset principle, Sonority sequencing principle",
author = "Johnson, {David O.} and Okim Kang",
year = "2017",
month = "11",
day = "22",
doi = "10.1007/s10462-017-9594-y",
language = "English (US)",
pages = "1--24",
journal = "Artificial Intelligence Review",
issn = "0269-2821",
publisher = "Springer Netherlands",

}

TY - JOUR

T1 - Comparison of algorithms to divide noisy phone sequences into syllables for automatic unconstrained English speaking proficiency scoring

AU - Johnson, David O.

AU - Kang, Okim

PY - 2017/11/22

Y1 - 2017/11/22

N2 - Four algorithms for syllabifying phones are compared in automatically scoring English oral proficiency. The first algorithm clusters consonants into groups with the vowel nearer to them temporally, taking into account the maximal onset principle. A Hidden Markov Model (HMM) predicts the syllable boundaries based on their sonority value in the second algorithm. The third one employs three HMMs which are tuned to specific categories of utterances. The final algorithm uses a genetic algorithm to identify a set of rules for syllabifying the phones. They were evaluated by: (1) how well they syllabified utterances from the Boston University Radio News Corpus (BURNC) and (2) how well they worked as part of a process to automatically score English speaking proficiency. A measure of the temporal alignment of the syllables was utilized to judge how satisfactorily they syllabified utterances. Their suitability in the proficiency process was assessed with the Pearson correlation between the computer’s predicted proficiency scores and the scores determined by human examiners. We found that syllabification-by-genetic-algorithm performed the best in syllabifying the BURNC, but that syllabification-by-grouping (i.e., syllables are made by grouping non-syllabic consonant phones with the vowel or syllabic consonant phone nearest to them with respect to time) performed the best in the English oral proficiency rating application.

AB - Four algorithms for syllabifying phones are compared in automatically scoring English oral proficiency. The first algorithm clusters consonants into groups with the vowel nearer to them temporally, taking into account the maximal onset principle. A Hidden Markov Model (HMM) predicts the syllable boundaries based on their sonority value in the second algorithm. The third one employs three HMMs which are tuned to specific categories of utterances. The final algorithm uses a genetic algorithm to identify a set of rules for syllabifying the phones. They were evaluated by: (1) how well they syllabified utterances from the Boston University Radio News Corpus (BURNC) and (2) how well they worked as part of a process to automatically score English speaking proficiency. A measure of the temporal alignment of the syllables was utilized to judge how satisfactorily they syllabified utterances. Their suitability in the proficiency process was assessed with the Pearson correlation between the computer’s predicted proficiency scores and the scores determined by human examiners. We found that syllabification-by-genetic-algorithm performed the best in syllabifying the BURNC, but that syllabification-by-grouping (i.e., syllables are made by grouping non-syllabic consonant phones with the vowel or syllabic consonant phone nearest to them with respect to time) performed the best in the English oral proficiency rating application.

KW - ASR phone recognition

KW - Automatic speaking proficiency scoring

KW - Automatic syllabification

KW - Maximal onset principle

KW - Sonority sequencing principle

UR - http://www.scopus.com/inward/record.url?scp=85034664551&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85034664551&partnerID=8YFLogxK

U2 - 10.1007/s10462-017-9594-y

DO - 10.1007/s10462-017-9594-y

M3 - Article

AN - SCOPUS:85034664551

SP - 1

EP - 24

JO - Artificial Intelligence Review

JF - Artificial Intelligence Review

SN - 0269-2821

ER -