Machine Learning and Integrative Analysis of Biomedical Big Data

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical an...

Full description

Bibliographic Details
Main Authors:	Bilal Mirza, Wei Wang, Jie Wang, Howard Choi, Neo Christopher Chung, Peipei Ping
Format:	Article
Language:	English
Published:	MDPI AG 2019-01-01
Series:	Genes
Subjects:	machine learning multi-omics data integration curse of dimensionality heterogeneous data missing data class imbalance scalability
Online Access:	https://www.mdpi.com/2073-4425/10/2/87

id	doaj-e0cca1a821554c2686c348b702be7238
record_format	Article
spelling	doaj-e0cca1a821554c2686c348b702be72382020-11-24T23:56:42ZengMDPI AGGenes2073-44252019-01-011028710.3390/genes10020087genes10020087Machine Learning and Integrative Analysis of Biomedical Big DataBilal Mirza0Wei Wang1Jie Wang2Howard Choi3Neo Christopher Chung4Peipei Ping5NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USANIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USANIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USANIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USANIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USANIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USARecent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues.https://www.mdpi.com/2073-4425/10/2/87machine learningmulti-omicsdata integrationcurse of dimensionalityheterogeneous datamissing dataclass imbalancescalability
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Bilal Mirza Wei Wang Jie Wang Howard Choi Neo Christopher Chung Peipei Ping
spellingShingle	Bilal Mirza Wei Wang Jie Wang Howard Choi Neo Christopher Chung Peipei Ping Machine Learning and Integrative Analysis of Biomedical Big Data Genes machine learning multi-omics data integration curse of dimensionality heterogeneous data missing data class imbalance scalability
author_facet	Bilal Mirza Wei Wang Jie Wang Howard Choi Neo Christopher Chung Peipei Ping
author_sort	Bilal Mirza
title	Machine Learning and Integrative Analysis of Biomedical Big Data
title_short	Machine Learning and Integrative Analysis of Biomedical Big Data
title_full	Machine Learning and Integrative Analysis of Biomedical Big Data
title_fullStr	Machine Learning and Integrative Analysis of Biomedical Big Data
title_full_unstemmed	Machine Learning and Integrative Analysis of Biomedical Big Data
title_sort	machine learning and integrative analysis of biomedical big data
publisher	MDPI AG
series	Genes
issn	2073-4425
publishDate	2019-01-01
description	Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues.
topic	machine learning multi-omics data integration curse of dimensionality heterogeneous data missing data class imbalance scalability
url	https://www.mdpi.com/2073-4425/10/2/87
work_keys_str_mv	AT bilalmirza machinelearningandintegrativeanalysisofbiomedicalbigdata AT weiwang machinelearningandintegrativeanalysisofbiomedicalbigdata AT jiewang machinelearningandintegrativeanalysisofbiomedicalbigdata AT howardchoi machinelearningandintegrativeanalysisofbiomedicalbigdata AT neochristopherchung machinelearningandintegrativeanalysisofbiomedicalbigdata AT peipeiping machinelearningandintegrativeanalysisofbiomedicalbigdata
_version_	1725457051641970688

Machine Learning and Integrative Analysis of Biomedical Big Data

Similar Items