Machine Learning and Integrative Analysis of Biomedical Big Data

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical an...

Full description

Bibliographic Details
Main Authors: Bilal Mirza, Wei Wang, Jie Wang, Howard Choi, Neo Christopher Chung, Peipei Ping
Format: Article
Language:English
Published: MDPI AG 2019-01-01
Series:Genes
Subjects:
Online Access:https://www.mdpi.com/2073-4425/10/2/87
id doaj-e0cca1a821554c2686c348b702be7238
record_format Article
spelling doaj-e0cca1a821554c2686c348b702be72382020-11-24T23:56:42ZengMDPI AGGenes2073-44252019-01-011028710.3390/genes10020087genes10020087Machine Learning and Integrative Analysis of Biomedical Big DataBilal Mirza0Wei Wang1Jie Wang2Howard Choi3Neo Christopher Chung4Peipei Ping5NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USANIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USANIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USANIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USANIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USANIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USARecent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues.https://www.mdpi.com/2073-4425/10/2/87machine learningmulti-omicsdata integrationcurse of dimensionalityheterogeneous datamissing dataclass imbalancescalability
collection DOAJ
language English
format Article
sources DOAJ
author Bilal Mirza
Wei Wang
Jie Wang
Howard Choi
Neo Christopher Chung
Peipei Ping
spellingShingle Bilal Mirza
Wei Wang
Jie Wang
Howard Choi
Neo Christopher Chung
Peipei Ping
Machine Learning and Integrative Analysis of Biomedical Big Data
Genes
machine learning
multi-omics
data integration
curse of dimensionality
heterogeneous data
missing data
class imbalance
scalability
author_facet Bilal Mirza
Wei Wang
Jie Wang
Howard Choi
Neo Christopher Chung
Peipei Ping
author_sort Bilal Mirza
title Machine Learning and Integrative Analysis of Biomedical Big Data
title_short Machine Learning and Integrative Analysis of Biomedical Big Data
title_full Machine Learning and Integrative Analysis of Biomedical Big Data
title_fullStr Machine Learning and Integrative Analysis of Biomedical Big Data
title_full_unstemmed Machine Learning and Integrative Analysis of Biomedical Big Data
title_sort machine learning and integrative analysis of biomedical big data
publisher MDPI AG
series Genes
issn 2073-4425
publishDate 2019-01-01
description Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues.
topic machine learning
multi-omics
data integration
curse of dimensionality
heterogeneous data
missing data
class imbalance
scalability
url https://www.mdpi.com/2073-4425/10/2/87
work_keys_str_mv AT bilalmirza machinelearningandintegrativeanalysisofbiomedicalbigdata
AT weiwang machinelearningandintegrativeanalysisofbiomedicalbigdata
AT jiewang machinelearningandintegrativeanalysisofbiomedicalbigdata
AT howardchoi machinelearningandintegrativeanalysisofbiomedicalbigdata
AT neochristopherchung machinelearningandintegrativeanalysisofbiomedicalbigdata
AT peipeiping machinelearningandintegrativeanalysisofbiomedicalbigdata
_version_ 1725457051641970688