Optimal Statistical Feature Subset Selection for Bearing Fault Detection and Severity Estimation

The performance of bearing fault detection systems based on machine learning techniques largely depends on the selected features. Hence, selection of an ideal number of dominant features from a comprehensive list of features is needed to decrease the number of computations involved in fault detectio...

Full description

Bibliographic Details
Main Authors: Chhaya Grover, Neelam Turk
Format: Article
Language:English
Published: Hindawi Limited 2020-01-01
Series:Shock and Vibration
Online Access:http://dx.doi.org/10.1155/2020/5742053
Description
Summary:The performance of bearing fault detection systems based on machine learning techniques largely depends on the selected features. Hence, selection of an ideal number of dominant features from a comprehensive list of features is needed to decrease the number of computations involved in fault detection. In this paper, we attempted statistical time-domain features, namely, Hjorth parameters (activity, mobility, and complexity) and normal negative log likelihood for Gaussian mixture model (GMM) for the first time in addition to 26 other established statistical features for identification of bearing fault type and severity. Two datasets are derived from a publicly available database of Case Western Reserve University to identify the capability of features in fault identification under various fault sizes and motor loads. Features have been investigated using a two-step approach—filter-based ranking with 3 metrics followed by feature subset selection with 11 search techniques. The results indicate that the set of features root mean square, geometric mean, zero crossing rate, Hjorth parameter—mobility, and normal negative log likelihood for GMM outperforms other features. We also compared the diagnostic performance of normal negative log likelihood for GMM with the established feature normal negative log likelihood for single Gaussian. The selected set of statistical features is validated using ensemble rule-based classifiers and showed an average accuracy of 96.75% with proposed statistical features subset and 99.63% with all 30 features. F-measure and G-mean scores are also calculated to investigate their performance on datasets with class imbalance. The diagnostic effectiveness of the features was further validated on a bearing dataset obtained from an operating thermal power plant. The results obtained show that our newly proposed feature subset plays a major role in achieving good classification results and has a future potential of being used in a high-dimensional dataset with multidomain features.
ISSN:1070-9622
1875-9203