Optimal Statistical Feature Subset Selection for Bearing Fault Detection and Severity Estimation

The performance of bearing fault detection systems based on machine learning techniques largely depends on the selected features. Hence, selection of an ideal number of dominant features from a comprehensive list of features is needed to decrease the number of computations involved in fault detectio...

Full description

Bibliographic Details
Main Authors: Chhaya Grover, Neelam Turk
Format: Article
Language:English
Published: Hindawi Limited 2020-01-01
Series:Shock and Vibration
Online Access:http://dx.doi.org/10.1155/2020/5742053
id doaj-aa13c3196c8d4dd6a59b27db8afcfba0
record_format Article
spelling doaj-aa13c3196c8d4dd6a59b27db8afcfba02020-11-25T03:44:06ZengHindawi LimitedShock and Vibration1070-96221875-92032020-01-01202010.1155/2020/57420535742053Optimal Statistical Feature Subset Selection for Bearing Fault Detection and Severity EstimationChhaya Grover0Neelam Turk1Department of Electronics Engineering, J.C. Bose University of Science and Technology (YMCA), Sector-6, Faridabad, Haryana 121006, IndiaDepartment of Electronics Engineering, J.C. Bose University of Science and Technology (YMCA), Sector-6, Faridabad, Haryana 121006, IndiaThe performance of bearing fault detection systems based on machine learning techniques largely depends on the selected features. Hence, selection of an ideal number of dominant features from a comprehensive list of features is needed to decrease the number of computations involved in fault detection. In this paper, we attempted statistical time-domain features, namely, Hjorth parameters (activity, mobility, and complexity) and normal negative log likelihood for Gaussian mixture model (GMM) for the first time in addition to 26 other established statistical features for identification of bearing fault type and severity. Two datasets are derived from a publicly available database of Case Western Reserve University to identify the capability of features in fault identification under various fault sizes and motor loads. Features have been investigated using a two-step approach—filter-based ranking with 3 metrics followed by feature subset selection with 11 search techniques. The results indicate that the set of features root mean square, geometric mean, zero crossing rate, Hjorth parameter—mobility, and normal negative log likelihood for GMM outperforms other features. We also compared the diagnostic performance of normal negative log likelihood for GMM with the established feature normal negative log likelihood for single Gaussian. The selected set of statistical features is validated using ensemble rule-based classifiers and showed an average accuracy of 96.75% with proposed statistical features subset and 99.63% with all 30 features. F-measure and G-mean scores are also calculated to investigate their performance on datasets with class imbalance. The diagnostic effectiveness of the features was further validated on a bearing dataset obtained from an operating thermal power plant. The results obtained show that our newly proposed feature subset plays a major role in achieving good classification results and has a future potential of being used in a high-dimensional dataset with multidomain features.http://dx.doi.org/10.1155/2020/5742053
collection DOAJ
language English
format Article
sources DOAJ
author Chhaya Grover
Neelam Turk
spellingShingle Chhaya Grover
Neelam Turk
Optimal Statistical Feature Subset Selection for Bearing Fault Detection and Severity Estimation
Shock and Vibration
author_facet Chhaya Grover
Neelam Turk
author_sort Chhaya Grover
title Optimal Statistical Feature Subset Selection for Bearing Fault Detection and Severity Estimation
title_short Optimal Statistical Feature Subset Selection for Bearing Fault Detection and Severity Estimation
title_full Optimal Statistical Feature Subset Selection for Bearing Fault Detection and Severity Estimation
title_fullStr Optimal Statistical Feature Subset Selection for Bearing Fault Detection and Severity Estimation
title_full_unstemmed Optimal Statistical Feature Subset Selection for Bearing Fault Detection and Severity Estimation
title_sort optimal statistical feature subset selection for bearing fault detection and severity estimation
publisher Hindawi Limited
series Shock and Vibration
issn 1070-9622
1875-9203
publishDate 2020-01-01
description The performance of bearing fault detection systems based on machine learning techniques largely depends on the selected features. Hence, selection of an ideal number of dominant features from a comprehensive list of features is needed to decrease the number of computations involved in fault detection. In this paper, we attempted statistical time-domain features, namely, Hjorth parameters (activity, mobility, and complexity) and normal negative log likelihood for Gaussian mixture model (GMM) for the first time in addition to 26 other established statistical features for identification of bearing fault type and severity. Two datasets are derived from a publicly available database of Case Western Reserve University to identify the capability of features in fault identification under various fault sizes and motor loads. Features have been investigated using a two-step approach—filter-based ranking with 3 metrics followed by feature subset selection with 11 search techniques. The results indicate that the set of features root mean square, geometric mean, zero crossing rate, Hjorth parameter—mobility, and normal negative log likelihood for GMM outperforms other features. We also compared the diagnostic performance of normal negative log likelihood for GMM with the established feature normal negative log likelihood for single Gaussian. The selected set of statistical features is validated using ensemble rule-based classifiers and showed an average accuracy of 96.75% with proposed statistical features subset and 99.63% with all 30 features. F-measure and G-mean scores are also calculated to investigate their performance on datasets with class imbalance. The diagnostic effectiveness of the features was further validated on a bearing dataset obtained from an operating thermal power plant. The results obtained show that our newly proposed feature subset plays a major role in achieving good classification results and has a future potential of being used in a high-dimensional dataset with multidomain features.
url http://dx.doi.org/10.1155/2020/5742053
work_keys_str_mv AT chhayagrover optimalstatisticalfeaturesubsetselectionforbearingfaultdetectionandseverityestimation
AT neelamturk optimalstatisticalfeaturesubsetselectionforbearingfaultdetectionandseverityestimation
_version_ 1715130488687951872