An improved clustering algorithm based on finite Gaussian mixture model

Zhilin He, Chun-Hsing Ho

Research output: Contribution to journalArticle

Abstract

The Finite Gaussian Mixture Model (FGMM) is the most commonly used model for describing mixed density distribution in cluster analysis. An important feature of the FGMM is that it can infinitely approximate any continuous distribution, as long as the model contains enough number of components. In the clustering analysis based on the FGMM, the EM algorithm is usually used to estimate the parameters of the model. The advantage is that the computation is stable and the convergence speed is fast. However, the EM algorithm relies heavily on the estimation of incomplete data. It does not use any information to reduce the uncertainty of missing data. To solve this problem, an EM algorithm based on entropy penalized maximum likelihood estimation is proposed. The novel algorithm constructs the conditional entropy model between incomplete data and missing data, and reduces the uncertainty of missing data through incomplete data. Theoretical analysis and experimental results show that the novel algorithm can effectively adapt to the FGMM, improve the clustering results and improve the efficiency of the algorithm.

LanguageEnglish (US)
JournalMultimedia Tools and Applications
DOIs
StateAccepted/In press - Jan 1 2018

Fingerprint

Clustering algorithms
Entropy
Maximum likelihood estimation
Information use
Cluster analysis

Keywords

  • Cluster analysis
  • EM algorithm
  • Gaussian mixture model

ASJC Scopus subject areas

  • Software
  • Media Technology
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

An improved clustering algorithm based on finite Gaussian mixture model. / He, Zhilin; Ho, Chun-Hsing.

In: Multimedia Tools and Applications, 01.01.2018.

Research output: Contribution to journalArticle

@article{4a31b8cbfba64734b17b35a9fb2a19bf,
title = "An improved clustering algorithm based on finite Gaussian mixture model",
abstract = "The Finite Gaussian Mixture Model (FGMM) is the most commonly used model for describing mixed density distribution in cluster analysis. An important feature of the FGMM is that it can infinitely approximate any continuous distribution, as long as the model contains enough number of components. In the clustering analysis based on the FGMM, the EM algorithm is usually used to estimate the parameters of the model. The advantage is that the computation is stable and the convergence speed is fast. However, the EM algorithm relies heavily on the estimation of incomplete data. It does not use any information to reduce the uncertainty of missing data. To solve this problem, an EM algorithm based on entropy penalized maximum likelihood estimation is proposed. The novel algorithm constructs the conditional entropy model between incomplete data and missing data, and reduces the uncertainty of missing data through incomplete data. Theoretical analysis and experimental results show that the novel algorithm can effectively adapt to the FGMM, improve the clustering results and improve the efficiency of the algorithm.",
keywords = "Cluster analysis, EM algorithm, Gaussian mixture model",
author = "Zhilin He and Chun-Hsing Ho",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/s11042-018-6988-z",
language = "English (US)",
journal = "Multimedia Tools and Applications",
issn = "1380-7501",
publisher = "Springer Netherlands",

}

TY - JOUR

T1 - An improved clustering algorithm based on finite Gaussian mixture model

AU - He, Zhilin

AU - Ho, Chun-Hsing

PY - 2018/1/1

Y1 - 2018/1/1

N2 - The Finite Gaussian Mixture Model (FGMM) is the most commonly used model for describing mixed density distribution in cluster analysis. An important feature of the FGMM is that it can infinitely approximate any continuous distribution, as long as the model contains enough number of components. In the clustering analysis based on the FGMM, the EM algorithm is usually used to estimate the parameters of the model. The advantage is that the computation is stable and the convergence speed is fast. However, the EM algorithm relies heavily on the estimation of incomplete data. It does not use any information to reduce the uncertainty of missing data. To solve this problem, an EM algorithm based on entropy penalized maximum likelihood estimation is proposed. The novel algorithm constructs the conditional entropy model between incomplete data and missing data, and reduces the uncertainty of missing data through incomplete data. Theoretical analysis and experimental results show that the novel algorithm can effectively adapt to the FGMM, improve the clustering results and improve the efficiency of the algorithm.

AB - The Finite Gaussian Mixture Model (FGMM) is the most commonly used model for describing mixed density distribution in cluster analysis. An important feature of the FGMM is that it can infinitely approximate any continuous distribution, as long as the model contains enough number of components. In the clustering analysis based on the FGMM, the EM algorithm is usually used to estimate the parameters of the model. The advantage is that the computation is stable and the convergence speed is fast. However, the EM algorithm relies heavily on the estimation of incomplete data. It does not use any information to reduce the uncertainty of missing data. To solve this problem, an EM algorithm based on entropy penalized maximum likelihood estimation is proposed. The novel algorithm constructs the conditional entropy model between incomplete data and missing data, and reduces the uncertainty of missing data through incomplete data. Theoretical analysis and experimental results show that the novel algorithm can effectively adapt to the FGMM, improve the clustering results and improve the efficiency of the algorithm.

KW - Cluster analysis

KW - EM algorithm

KW - Gaussian mixture model

UR - http://www.scopus.com/inward/record.url?scp=85058856320&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058856320&partnerID=8YFLogxK

U2 - 10.1007/s11042-018-6988-z

DO - 10.1007/s11042-018-6988-z

M3 - Article

JO - Multimedia Tools and Applications

T2 - Multimedia Tools and Applications

JF - Multimedia Tools and Applications

SN - 1380-7501

ER -