Clustering of Biological Datasets in the Era of Big Data

Clustering is a long-standing problem in computer science and is applied in virtually any scientific field for exploring the inherent structure of datasets. In biomedical research, clustering tools have been utilized in manifold areas, among many others in expression analysis, disease subtyping or p...

Full description

Bibliographic Details
Main Author: Röttger Richard
Format: Article
Language:English
Published: De Gruyter 2016-03-01
Series:Journal of Integrative Bioinformatics
Online Access:https://doi.org/10.1515/jib-2016-300
id doaj-327bc315f1b64120b3c9ceb16ee995d1
record_format Article
spelling doaj-327bc315f1b64120b3c9ceb16ee995d12021-09-06T19:40:32ZengDe GruyterJournal of Integrative Bioinformatics1613-45162016-03-01131528110.1515/jib-2016-300jib-2016-300Clustering of Biological Datasets in the Era of Big DataRöttger Richard0Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, 5230 Odense, http://imada.sdu.dk/˜roettger/ DenmarkClustering is a long-standing problem in computer science and is applied in virtually any scientific field for exploring the inherent structure of datasets. In biomedical research, clustering tools have been utilized in manifold areas, among many others in expression analysis, disease subtyping or protein research. A plethora of different approaches have been developed but there is only little guideline what approach is the optimal in what particular situation. Furthermore, a typical cluster analysis is an entire process with several highly interconnected steps; from preprocessing, proximity calculation, the actual clustering to evaluation and optimization. Only when all steps seamlessly work together, an optimal result can be achieved. This renders a cluster analyses tiresome and error-prone especially for non-experts. A mere trial-and-error approach renders increasingly infeasible when considering the tremendous growth of available datasets; thus, a strategic and thoughtful course of action is crucial for a cluster analysis. This manuscript provides an overview of the crucial steps and the most common techniques involved in conducting a state-of-the-art cluster analysis of biomedical datasets.https://doi.org/10.1515/jib-2016-300
collection DOAJ
language English
format Article
sources DOAJ
author Röttger Richard
spellingShingle Röttger Richard
Clustering of Biological Datasets in the Era of Big Data
Journal of Integrative Bioinformatics
author_facet Röttger Richard
author_sort Röttger Richard
title Clustering of Biological Datasets in the Era of Big Data
title_short Clustering of Biological Datasets in the Era of Big Data
title_full Clustering of Biological Datasets in the Era of Big Data
title_fullStr Clustering of Biological Datasets in the Era of Big Data
title_full_unstemmed Clustering of Biological Datasets in the Era of Big Data
title_sort clustering of biological datasets in the era of big data
publisher De Gruyter
series Journal of Integrative Bioinformatics
issn 1613-4516
publishDate 2016-03-01
description Clustering is a long-standing problem in computer science and is applied in virtually any scientific field for exploring the inherent structure of datasets. In biomedical research, clustering tools have been utilized in manifold areas, among many others in expression analysis, disease subtyping or protein research. A plethora of different approaches have been developed but there is only little guideline what approach is the optimal in what particular situation. Furthermore, a typical cluster analysis is an entire process with several highly interconnected steps; from preprocessing, proximity calculation, the actual clustering to evaluation and optimization. Only when all steps seamlessly work together, an optimal result can be achieved. This renders a cluster analyses tiresome and error-prone especially for non-experts. A mere trial-and-error approach renders increasingly infeasible when considering the tremendous growth of available datasets; thus, a strategic and thoughtful course of action is crucial for a cluster analysis. This manuscript provides an overview of the crucial steps and the most common techniques involved in conducting a state-of-the-art cluster analysis of biomedical datasets.
url https://doi.org/10.1515/jib-2016-300
work_keys_str_mv AT rottgerrichard clusteringofbiologicaldatasetsintheeraofbigdata
_version_ 1717768223805407232