Summary: | Abstract Background Detection and quantification of cyclic alternating patterns (CAP) components has the potential to serve as a disease bio-marker. Few methods exist to discriminate all the different CAP components, they do not present appropriate sensitivities, and often they are evaluated based on accuracy (AC) that is not an appropriate measure for imbalanced datasets. Methods We describe a knowledge discovery methodology in data (KDD) aiming the development of automatic CAP scoring approaches. Automatic CAP scoring was faced from two perspectives: the binary distinction between A-phases and B-phases, and also for multi-class classification of the different CAP components. The most important KDD stages are: extraction of 55 features, feature ranking/transformation, and classification. Classification is performed by (i) support vector machine (SVM), (ii) k-nearest neighbors (k-NN), and (iii) discriminant analysis. We report the weighted accuracy (WAC) that accounts for class imbalance. Results The study includes 30 subjects from the CAP Sleep Database of Physionet. The best alternative for the discrimination of the different A-phase subtypes involved feature ranking by the minimum redundancy maximum relevance algorithm (mRMR) and classification by SVM, with a WAC of 51%. Concerning the binary discrimination between A-phases and B-phases, k-NN with mRMR ranking achieved the best WAC of 80%. Conclusions We describe a KDD that, to the best of our knowledge, was for the first time applied to CAP scoring. In particular, the fully discrimination of the three different A-phases subtypes is a new perspective, since past works tried multi-class approaches but based on grouping of different sub-types. We also considered the weighted accuracy, in addition to simple accuracy, resulting in a more trustworthy performance assessment. Globally, better subtype sensitivities than other published approaches were achieved.
|