A Brain-Inspired Hyperdimensional Computing Approach for Classifying Massive DNA Methylation Data of Cancer

The recent advancements in cancer genomics have put under the spotlight DNA methylation, a genetic modification that regulates the functioning of the genome and whose modifications have an important role in tumorigenesis and tumor-suppression. Because of the high dimensionality and the enormous amou...

Full description

Bibliographic Details
Main Authors: Fabio Cumbo, Eleonora Cappelli, Emanuel Weitschek
Format: Article
Language:English
Published: MDPI AG 2020-09-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/13/9/233
id doaj-cabeef4dcd5d440089ebc8193066c0ad
record_format Article
spelling doaj-cabeef4dcd5d440089ebc8193066c0ad2020-11-25T03:37:38ZengMDPI AGAlgorithms1999-48932020-09-011323323310.3390/a13090233A Brain-Inspired Hyperdimensional Computing Approach for Classifying Massive DNA Methylation Data of CancerFabio Cumbo0Eleonora Cappelli1Emanuel Weitschek2Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Via Sommarive 9, 38123 Povo Trento, ItalyDepartment of Engineering, University of Roma Tre, Via della Vasca Navale 79/81, 00146 Rome, ItalyDepartment of Engineering, Uninettuno University, Corso Vittorio Emanuele II 39, 00186 Rome, ItalyThe recent advancements in cancer genomics have put under the spotlight DNA methylation, a genetic modification that regulates the functioning of the genome and whose modifications have an important role in tumorigenesis and tumor-suppression. Because of the high dimensionality and the enormous amount of genomic data that are produced through the last advancements in Next Generation Sequencing, it is very challenging to effectively make use of DNA methylation data in diagnostics applications, e.g., in the identification of healthy vs diseased samples. Additionally, state-of-the-art techniques are not fast enough to rapidly produce reliable results or efficient in managing those massive amounts of data. For this reason, we propose HD-classifier, an in-memory cognitive-based hyperdimensional (HD) supervised machine learning algorithm for the classification of tumor vs non tumor samples through the analysis of their DNA Methylation data. The approach takes inspiration from how the human brain is able to remember and distinguish simple and complex concepts by adopting hypervectors and no single numerical values. Exactly as the brain works, this allows for encoding complex patterns, which makes the whole architecture robust to failures and mistakes also with noisy data. We design and develop an algorithm and a software tool that is able to perform supervised classification with the HD approach. We conduct experiments on three DNA methylation datasets of different types of cancer in order to prove the validity of our algorithm, i.e., Breast Invasive Carcinoma (BRCA), Kidney renal papillary cell carcinoma (KIRP), and Thyroid carcinoma (THCA). We obtain outstanding results in terms of accuracy and computational time with a low amount of computational resources. Furthermore, we validate our approach by comparing it (i) to BIGBIOCL, a software based on Random Forest for classifying big omics datasets in distributed computing environments, (ii) to Support Vector Machine (SVM), and (iii) to Decision Tree state-of-the-art classification methods. Finally, we freely release both the datasets and the software on GitHub.https://www.mdpi.com/1999-4893/13/9/233algorithms in biologybioinformaticsmachine learningclassificationhyperdimensional computingcancer
collection DOAJ
language English
format Article
sources DOAJ
author Fabio Cumbo
Eleonora Cappelli
Emanuel Weitschek
spellingShingle Fabio Cumbo
Eleonora Cappelli
Emanuel Weitschek
A Brain-Inspired Hyperdimensional Computing Approach for Classifying Massive DNA Methylation Data of Cancer
Algorithms
algorithms in biology
bioinformatics
machine learning
classification
hyperdimensional computing
cancer
author_facet Fabio Cumbo
Eleonora Cappelli
Emanuel Weitschek
author_sort Fabio Cumbo
title A Brain-Inspired Hyperdimensional Computing Approach for Classifying Massive DNA Methylation Data of Cancer
title_short A Brain-Inspired Hyperdimensional Computing Approach for Classifying Massive DNA Methylation Data of Cancer
title_full A Brain-Inspired Hyperdimensional Computing Approach for Classifying Massive DNA Methylation Data of Cancer
title_fullStr A Brain-Inspired Hyperdimensional Computing Approach for Classifying Massive DNA Methylation Data of Cancer
title_full_unstemmed A Brain-Inspired Hyperdimensional Computing Approach for Classifying Massive DNA Methylation Data of Cancer
title_sort brain-inspired hyperdimensional computing approach for classifying massive dna methylation data of cancer
publisher MDPI AG
series Algorithms
issn 1999-4893
publishDate 2020-09-01
description The recent advancements in cancer genomics have put under the spotlight DNA methylation, a genetic modification that regulates the functioning of the genome and whose modifications have an important role in tumorigenesis and tumor-suppression. Because of the high dimensionality and the enormous amount of genomic data that are produced through the last advancements in Next Generation Sequencing, it is very challenging to effectively make use of DNA methylation data in diagnostics applications, e.g., in the identification of healthy vs diseased samples. Additionally, state-of-the-art techniques are not fast enough to rapidly produce reliable results or efficient in managing those massive amounts of data. For this reason, we propose HD-classifier, an in-memory cognitive-based hyperdimensional (HD) supervised machine learning algorithm for the classification of tumor vs non tumor samples through the analysis of their DNA Methylation data. The approach takes inspiration from how the human brain is able to remember and distinguish simple and complex concepts by adopting hypervectors and no single numerical values. Exactly as the brain works, this allows for encoding complex patterns, which makes the whole architecture robust to failures and mistakes also with noisy data. We design and develop an algorithm and a software tool that is able to perform supervised classification with the HD approach. We conduct experiments on three DNA methylation datasets of different types of cancer in order to prove the validity of our algorithm, i.e., Breast Invasive Carcinoma (BRCA), Kidney renal papillary cell carcinoma (KIRP), and Thyroid carcinoma (THCA). We obtain outstanding results in terms of accuracy and computational time with a low amount of computational resources. Furthermore, we validate our approach by comparing it (i) to BIGBIOCL, a software based on Random Forest for classifying big omics datasets in distributed computing environments, (ii) to Support Vector Machine (SVM), and (iii) to Decision Tree state-of-the-art classification methods. Finally, we freely release both the datasets and the software on GitHub.
topic algorithms in biology
bioinformatics
machine learning
classification
hyperdimensional computing
cancer
url https://www.mdpi.com/1999-4893/13/9/233
work_keys_str_mv AT fabiocumbo abraininspiredhyperdimensionalcomputingapproachforclassifyingmassivednamethylationdataofcancer
AT eleonoracappelli abraininspiredhyperdimensionalcomputingapproachforclassifyingmassivednamethylationdataofcancer
AT emanuelweitschek abraininspiredhyperdimensionalcomputingapproachforclassifyingmassivednamethylationdataofcancer
AT fabiocumbo braininspiredhyperdimensionalcomputingapproachforclassifyingmassivednamethylationdataofcancer
AT eleonoracappelli braininspiredhyperdimensionalcomputingapproachforclassifyingmassivednamethylationdataofcancer
AT emanuelweitschek braininspiredhyperdimensionalcomputingapproachforclassifyingmassivednamethylationdataofcancer
_version_ 1724544680791638016