Optimizing the Estimation of a Histogram-Bin Width—Application to the Multivariate Mixture-Model Estimation

A maximum-likelihood estimation of a multivariate mixture model’s parameters is a difficult problem. One approach is to combine the REBMIX and EM algorithms. However, the REBMIX algorithm requires the use of histogram estimation, which is the most rudimentary approach to an empirical density estimat...

Full description

Bibliographic Details
Main Authors: Branislav Panić, Jernej Klemenc, Marko Nagode
Format: Article
Language:English
Published: MDPI AG 2020-07-01
Series:Mathematics
Subjects:
EM
Online Access:https://www.mdpi.com/2227-7390/8/7/1090
id doaj-4384a050b812479caf6fb4aa49bd26c8
record_format Article
spelling doaj-4384a050b812479caf6fb4aa49bd26c82020-11-25T03:34:24ZengMDPI AGMathematics2227-73902020-07-0181090109010.3390/math8071090Optimizing the Estimation of a Histogram-Bin Width—Application to the Multivariate Mixture-Model EstimationBranislav Panić0Jernej Klemenc1Marko Nagode2Faculty of Mechanical Engineering, University of Ljubljana, Aškerčeva ulica 6, 1000 Ljubljana, SloveniaFaculty of Mechanical Engineering, University of Ljubljana, Aškerčeva ulica 6, 1000 Ljubljana, SloveniaFaculty of Mechanical Engineering, University of Ljubljana, Aškerčeva ulica 6, 1000 Ljubljana, SloveniaA maximum-likelihood estimation of a multivariate mixture model’s parameters is a difficult problem. One approach is to combine the REBMIX and EM algorithms. However, the REBMIX algorithm requires the use of histogram estimation, which is the most rudimentary approach to an empirical density estimation and has many drawbacks. Nevertheless, because of its simplicity, it is still one of the most commonly used techniques. The main problem is to estimate the optimum histogram-bin width, which is usually set by the number of non-overlapping, regularly spaced bins. For univariate problems it is usually denoted by an integer value; i.e., the number of bins. However, for multivariate problems, in order to obtain a histogram estimation, a regular grid must be formed. Thus, to obtain the optimum histogram estimation, an integer-optimization problem must be solved. The aim is therefore the estimation of optimum histogram binning, alone and in application to the mixture model parameter estimation with the REBMIX&EM strategy. As an estimator, the Knuth rule was used. For the optimization algorithm, a derivative based on the coordinate-descent optimization was composed. These proposals yielded promising results. The optimization algorithm was efficient and the results were accurate. When applied to the multivariate, Gaussian-mixture-model parameter estimation, the results were competitive. All the improvements were implemented in the <b>rebmix</b> R package.https://www.mdpi.com/2227-7390/8/7/1090histograminteger optimizationparameter estimationEMREBMIXmixture model
collection DOAJ
language English
format Article
sources DOAJ
author Branislav Panić
Jernej Klemenc
Marko Nagode
spellingShingle Branislav Panić
Jernej Klemenc
Marko Nagode
Optimizing the Estimation of a Histogram-Bin Width—Application to the Multivariate Mixture-Model Estimation
Mathematics
histogram
integer optimization
parameter estimation
EM
REBMIX
mixture model
author_facet Branislav Panić
Jernej Klemenc
Marko Nagode
author_sort Branislav Panić
title Optimizing the Estimation of a Histogram-Bin Width—Application to the Multivariate Mixture-Model Estimation
title_short Optimizing the Estimation of a Histogram-Bin Width—Application to the Multivariate Mixture-Model Estimation
title_full Optimizing the Estimation of a Histogram-Bin Width—Application to the Multivariate Mixture-Model Estimation
title_fullStr Optimizing the Estimation of a Histogram-Bin Width—Application to the Multivariate Mixture-Model Estimation
title_full_unstemmed Optimizing the Estimation of a Histogram-Bin Width—Application to the Multivariate Mixture-Model Estimation
title_sort optimizing the estimation of a histogram-bin width—application to the multivariate mixture-model estimation
publisher MDPI AG
series Mathematics
issn 2227-7390
publishDate 2020-07-01
description A maximum-likelihood estimation of a multivariate mixture model’s parameters is a difficult problem. One approach is to combine the REBMIX and EM algorithms. However, the REBMIX algorithm requires the use of histogram estimation, which is the most rudimentary approach to an empirical density estimation and has many drawbacks. Nevertheless, because of its simplicity, it is still one of the most commonly used techniques. The main problem is to estimate the optimum histogram-bin width, which is usually set by the number of non-overlapping, regularly spaced bins. For univariate problems it is usually denoted by an integer value; i.e., the number of bins. However, for multivariate problems, in order to obtain a histogram estimation, a regular grid must be formed. Thus, to obtain the optimum histogram estimation, an integer-optimization problem must be solved. The aim is therefore the estimation of optimum histogram binning, alone and in application to the mixture model parameter estimation with the REBMIX&EM strategy. As an estimator, the Knuth rule was used. For the optimization algorithm, a derivative based on the coordinate-descent optimization was composed. These proposals yielded promising results. The optimization algorithm was efficient and the results were accurate. When applied to the multivariate, Gaussian-mixture-model parameter estimation, the results were competitive. All the improvements were implemented in the <b>rebmix</b> R package.
topic histogram
integer optimization
parameter estimation
EM
REBMIX
mixture model
url https://www.mdpi.com/2227-7390/8/7/1090
work_keys_str_mv AT branislavpanic optimizingtheestimationofahistogrambinwidthapplicationtothemultivariatemixturemodelestimation
AT jernejklemenc optimizingtheestimationofahistogrambinwidthapplicationtothemultivariatemixturemodelestimation
AT markonagode optimizingtheestimationofahistogrambinwidthapplicationtothemultivariatemixturemodelestimation
_version_ 1724558968145051648