A Hybridized Feature Selection and Extraction Approach for Enhancing Cancer Prediction Based on DNA Methylation

Due to the vital role of the aberrant DNA methylation during the disease development such as cancer, the comprehension of its mechanism had become essential in the recent years for early detection and diagnosis. With the advent of the high-throughput technologies, there are still several challenges...

Full description

Bibliographic Details
Main Authors: Abeer A. Raweh, Mohammed Nassef, Amr Badr
Format: Article
Language:English
Published: IEEE 2018-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8307050/
Description
Summary:Due to the vital role of the aberrant DNA methylation during the disease development such as cancer, the comprehension of its mechanism had become essential in the recent years for early detection and diagnosis. With the advent of the high-throughput technologies, there are still several challenges to achieve the classification process using the DNA methylation data. The high-dimensionality and high-noisiness of the DNA methylation data may lead to the degradation of the prediction accuracy. Thus, it becomes increasingly important in a wide range to employ robust computational tools such as feature selection and extraction methods to extract the informative features amongst thousands of them, and hence improving cancer prediction. By using the DNA methylation degree in promoters and probes regions, this paper aims at predicting cancer with a hybridized approach based on the feature selection and feature extraction techniques. The suggested approach exploits a filter feature selection method called (F-score) to overcome the high-dimensionality problem of the DNA methylation data, and proposes an extraction model which employs the peaks of the mean methylation density, the fast Fourier transform algorithm, and the symmetry between the methylation density of a sample and the mean methylation density of both sample types normal and cancer as novel feature extraction methods, in order to accurate cancer classification and reduce training time. To evaluate the reliability of our approach, The naïve base, random forest, and support vector machine algorithms are introduced to predict different cancer types like: breast, colon, head, kidney, lung, thyroid, and uterine with and without the hybridized approach. The results show that, the classification accuracy improves in all most cases and it also proves the reliability indirectly.
ISSN:2169-3536