Feature Extension of Gut Microbiome Data for Deep Neural Network-Based Colorectal Cancer Classification

Colorectal cancer (CRC) is the third most deadly cancer worldwide. The use of gut microbiome in early detection of the disease has attracted much attention from the research community, mainly because of its noninvasive nature. Recent achievements in next generation sequencing technology have led to...

Full description

Bibliographic Details
Main Authors: Mwenge Mulenga, Sameem Abdul Kareem, Aznul Qalid Md Sabri, Manjeevan Seera, Suresh Govind, Chandramathi Samudi, Saharuddin Bin Mohamad
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9319639/
Description
Summary:Colorectal cancer (CRC) is the third most deadly cancer worldwide. The use of gut microbiome in early detection of the disease has attracted much attention from the research community, mainly because of its noninvasive nature. Recent achievements in next generation sequencing technology have led to increased availability of sequence data and enabled an environment for the growth of gut microbiome research. The use of conventional machine learning algorithms for automatic detection of CRC based on the microbiome is limited by factors such as low accuracy and the need for manual selection of features. Despite their success in other fields, Deep Neural Network (DNN) algorithms have limitations in microbiome-based CRC classification. These limitations include high dimensionality of microbiome data and other characteristics associated with sequence data such as feature dominance. In this paper, we propose a feature augmentation approach that aggregates data normalization methods to extend existing features of a dataset. The proposed method combines feature extension with data augmentation to improve CRC classification performance of a DNN model. The proposed model obtained area under the curve (AUC) scores of 0.96 and 0.89 on two publicly available microbiome datasets.
ISSN:2169-3536