Automatic prominent syllable detection with machine learning classifiers

David O. Johnson, Okim Kang

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

In this paper, we examine the performance of automatically detecting Brazil’s prominent syllables using five machine learning classifiers and seven sets of features consisting of three features: pitch, intensity, and duration, taken one at time, two at a time, and all three. Prominent syllables are the foundation of Brazil’s prosodic intonation model. We found that using pitch, intensity, and duration as features produces the best optimal results. Our findings also revealed that in terms of accuracy, F-measure, and Cohen’s kappa coefficient that bagging an ensemble of decision tree learners performed the best (accuracy = 95.9 ± 0.2 %; F-measure = 93.7 ± 0.4; κ = 0.907 ± 0.005). The performance of our current model proves to be significantly better than any other automatic detection software that exists or that of human transcription experts of prosody.

Original languageEnglish (US)
Pages (from-to)583-592
Number of pages10
JournalInternational Journal of Speech Technology
Volume18
Issue number4
DOIs
StatePublished - Dec 1 2015

Fingerprint

Learning systems
Classifiers
Transcription
Decision trees
learning
performance
expert
time
Classifier
Machine Learning
software
Prosody
Software
Ensemble
Decision Tree
Intonation

Keywords

  • Brazil’s prosodic intonation model
  • Machine learning
  • Prominent syllable detection
  • ToBI

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Software
  • Human-Computer Interaction
  • Language and Linguistics
  • Linguistics and Language

Cite this

Automatic prominent syllable detection with machine learning classifiers. / Johnson, David O.; Kang, Okim.

In: International Journal of Speech Technology, Vol. 18, No. 4, 01.12.2015, p. 583-592.

Research output: Contribution to journalArticle

@article{a377d5953ca44b90a1f5e47bce9109cf,
title = "Automatic prominent syllable detection with machine learning classifiers",
abstract = "In this paper, we examine the performance of automatically detecting Brazil’s prominent syllables using five machine learning classifiers and seven sets of features consisting of three features: pitch, intensity, and duration, taken one at time, two at a time, and all three. Prominent syllables are the foundation of Brazil’s prosodic intonation model. We found that using pitch, intensity, and duration as features produces the best optimal results. Our findings also revealed that in terms of accuracy, F-measure, and Cohen’s kappa coefficient that bagging an ensemble of decision tree learners performed the best (accuracy = 95.9 ± 0.2 {\%}; F-measure = 93.7 ± 0.4; κ = 0.907 ± 0.005). The performance of our current model proves to be significantly better than any other automatic detection software that exists or that of human transcription experts of prosody.",
keywords = "Brazil’s prosodic intonation model, Machine learning, Prominent syllable detection, ToBI",
author = "Johnson, {David O.} and Okim Kang",
year = "2015",
month = "12",
day = "1",
doi = "10.1007/s10772-015-9299-z",
language = "English (US)",
volume = "18",
pages = "583--592",
journal = "International Journal of Speech Technology",
issn = "1381-2416",
publisher = "Springer Netherlands",
number = "4",

}

TY - JOUR

T1 - Automatic prominent syllable detection with machine learning classifiers

AU - Johnson, David O.

AU - Kang, Okim

PY - 2015/12/1

Y1 - 2015/12/1

N2 - In this paper, we examine the performance of automatically detecting Brazil’s prominent syllables using five machine learning classifiers and seven sets of features consisting of three features: pitch, intensity, and duration, taken one at time, two at a time, and all three. Prominent syllables are the foundation of Brazil’s prosodic intonation model. We found that using pitch, intensity, and duration as features produces the best optimal results. Our findings also revealed that in terms of accuracy, F-measure, and Cohen’s kappa coefficient that bagging an ensemble of decision tree learners performed the best (accuracy = 95.9 ± 0.2 %; F-measure = 93.7 ± 0.4; κ = 0.907 ± 0.005). The performance of our current model proves to be significantly better than any other automatic detection software that exists or that of human transcription experts of prosody.

AB - In this paper, we examine the performance of automatically detecting Brazil’s prominent syllables using five machine learning classifiers and seven sets of features consisting of three features: pitch, intensity, and duration, taken one at time, two at a time, and all three. Prominent syllables are the foundation of Brazil’s prosodic intonation model. We found that using pitch, intensity, and duration as features produces the best optimal results. Our findings also revealed that in terms of accuracy, F-measure, and Cohen’s kappa coefficient that bagging an ensemble of decision tree learners performed the best (accuracy = 95.9 ± 0.2 %; F-measure = 93.7 ± 0.4; κ = 0.907 ± 0.005). The performance of our current model proves to be significantly better than any other automatic detection software that exists or that of human transcription experts of prosody.

KW - Brazil’s prosodic intonation model

KW - Machine learning

KW - Prominent syllable detection

KW - ToBI

UR - http://www.scopus.com/inward/record.url?scp=84947485584&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84947485584&partnerID=8YFLogxK

U2 - 10.1007/s10772-015-9299-z

DO - 10.1007/s10772-015-9299-z

M3 - Article

VL - 18

SP - 583

EP - 592

JO - International Journal of Speech Technology

JF - International Journal of Speech Technology

SN - 1381-2416

IS - 4

ER -