Optimized Mahalanobis–Taguchi System for High-Dimensional Small Sample Data Classification

The Mahalanobis–Taguchi system (MTS) is a multivariate data diagnosis and prediction technology, which is widely used to optimize large sample data or unbalanced data, but it is rarely used for high-dimensional small sample data. In this paper, the optimized MTS for the classification of high-dimens...

Full description

Bibliographic Details
Main Authors: Xinping Xiao, Dian Fu, Yu Shi, Jianghui Wen
Format: Article
Language:English
Published: Hindawi Limited 2020-01-01
Series:Computational Intelligence and Neuroscience
Online Access:http://dx.doi.org/10.1155/2020/4609423
id doaj-e189af772ea7441f9818b76712f9ece5
record_format Article
spelling doaj-e189af772ea7441f9818b76712f9ece52020-11-25T02:10:00ZengHindawi LimitedComputational Intelligence and Neuroscience1687-52651687-52732020-01-01202010.1155/2020/46094234609423Optimized Mahalanobis–Taguchi System for High-Dimensional Small Sample Data ClassificationXinping Xiao0Dian Fu1Yu Shi2Jianghui Wen3School of Science, Wuhan University of Technology, Wuhan 430070, ChinaSchool of Science, Wuhan University of Technology, Wuhan 430070, ChinaSchool of Science, Wuhan University of Technology, Wuhan 430070, ChinaSchool of Science, Wuhan University of Technology, Wuhan 430070, ChinaThe Mahalanobis–Taguchi system (MTS) is a multivariate data diagnosis and prediction technology, which is widely used to optimize large sample data or unbalanced data, but it is rarely used for high-dimensional small sample data. In this paper, the optimized MTS for the classification of high-dimensional small sample data is discussed from two aspects, namely, the inverse matrix instability of the covariance matrix and the instability of feature selection. Firstly, based on regularization and smoothing techniques, this paper proposes a modified Mahalanobis metric to calculate the Mahalanobis distance, which is aimed at reducing the influence of the inverse matrix instability under small sample conditions. Secondly, the minimum redundancy-maximum relevance (mRMR) algorithm is introduced into the MTS for the instability problem of feature selection. By using the mRMR algorithm and signal-to-noise ratio (SNR), a two-stage feature selection method is proposed: the mRMR algorithm is first used to remove noise and redundant variables; the orthogonal table and SNR are then used to screen the combination of variables that make great contribution to classification. Then, the feasibility and simplicity of the optimized MTS are shown in five datasets from the UCI database. The Mahalanobis distance based on regularization and smoothing techniques (RS-MD) is more robust than the traditional Mahalanobis distance. The two-stage feature selection method improves the effectiveness of feature selection for MTS. Finally, the optimized MTS is applied to email classification of the Spambase dataset. The results show that the optimized MTS outperforms the classical MTS and the other 3 machine learning algorithms.http://dx.doi.org/10.1155/2020/4609423
collection DOAJ
language English
format Article
sources DOAJ
author Xinping Xiao
Dian Fu
Yu Shi
Jianghui Wen
spellingShingle Xinping Xiao
Dian Fu
Yu Shi
Jianghui Wen
Optimized Mahalanobis–Taguchi System for High-Dimensional Small Sample Data Classification
Computational Intelligence and Neuroscience
author_facet Xinping Xiao
Dian Fu
Yu Shi
Jianghui Wen
author_sort Xinping Xiao
title Optimized Mahalanobis–Taguchi System for High-Dimensional Small Sample Data Classification
title_short Optimized Mahalanobis–Taguchi System for High-Dimensional Small Sample Data Classification
title_full Optimized Mahalanobis–Taguchi System for High-Dimensional Small Sample Data Classification
title_fullStr Optimized Mahalanobis–Taguchi System for High-Dimensional Small Sample Data Classification
title_full_unstemmed Optimized Mahalanobis–Taguchi System for High-Dimensional Small Sample Data Classification
title_sort optimized mahalanobis–taguchi system for high-dimensional small sample data classification
publisher Hindawi Limited
series Computational Intelligence and Neuroscience
issn 1687-5265
1687-5273
publishDate 2020-01-01
description The Mahalanobis–Taguchi system (MTS) is a multivariate data diagnosis and prediction technology, which is widely used to optimize large sample data or unbalanced data, but it is rarely used for high-dimensional small sample data. In this paper, the optimized MTS for the classification of high-dimensional small sample data is discussed from two aspects, namely, the inverse matrix instability of the covariance matrix and the instability of feature selection. Firstly, based on regularization and smoothing techniques, this paper proposes a modified Mahalanobis metric to calculate the Mahalanobis distance, which is aimed at reducing the influence of the inverse matrix instability under small sample conditions. Secondly, the minimum redundancy-maximum relevance (mRMR) algorithm is introduced into the MTS for the instability problem of feature selection. By using the mRMR algorithm and signal-to-noise ratio (SNR), a two-stage feature selection method is proposed: the mRMR algorithm is first used to remove noise and redundant variables; the orthogonal table and SNR are then used to screen the combination of variables that make great contribution to classification. Then, the feasibility and simplicity of the optimized MTS are shown in five datasets from the UCI database. The Mahalanobis distance based on regularization and smoothing techniques (RS-MD) is more robust than the traditional Mahalanobis distance. The two-stage feature selection method improves the effectiveness of feature selection for MTS. Finally, the optimized MTS is applied to email classification of the Spambase dataset. The results show that the optimized MTS outperforms the classical MTS and the other 3 machine learning algorithms.
url http://dx.doi.org/10.1155/2020/4609423
work_keys_str_mv AT xinpingxiao optimizedmahalanobistaguchisystemforhighdimensionalsmallsampledataclassification
AT dianfu optimizedmahalanobistaguchisystemforhighdimensionalsmallsampledataclassification
AT yushi optimizedmahalanobistaguchisystemforhighdimensionalsmallsampledataclassification
AT jianghuiwen optimizedmahalanobistaguchisystemforhighdimensionalsmallsampledataclassification
_version_ 1715557067848155136