Identification of disease-distinct complex biomarker patterns by means of unsupervised machine-learning using an interactive R toolbox (Umatrix)

Abstract Background Unsupervised machine-learned analysis of cluster structures, applied using the emergent self-organizing feature maps (ESOM) combined with the unified distance matrix (U-matrix) has been shown to provide an unbiased method to identify true clusters. It outperforms classical hierar...

Full description

Bibliographic Details
Main Authors: Jörn Lötsch, Florian Lerch, Ruth Djaldetti, Irmgard Tegder, Alfred Ultsch
Format: Article
Language:English
Published: BMC 2018-04-01
Series:Big Data Analytics
Online Access:http://link.springer.com/article/10.1186/s41044-018-0032-1
Description
Summary:Abstract Background Unsupervised machine-learned analysis of cluster structures, applied using the emergent self-organizing feature maps (ESOM) combined with the unified distance matrix (U-matrix) has been shown to provide an unbiased method to identify true clusters. It outperforms classical hierarchical clustering algorithms that carry a considerable tendency to produce erroneous results. To facilitate the application of the ESOM/U-matrix method in biomedical research, we introduce the interactive R-based bioinformatics tool “Umatrix”, which enables valid identification of a biologically meaningful cluster structure in the data by training a Kohonen-type self-organizing map followed by interface-guided interactive clustering on the emergent U-matrix map. Results The ability to detect clinical relevant subgroups was applied to a data set comprising plasma concentrations of d = 25 lipid markers including endocannabinoids, lysophosphatidic acids, ceramides and sphingolipids acquired from n = 100 patients with Parkinson's disease and n = 100 controls. Following ESOM training, clear data structures in the high-dimensional data space were observed on the U-matrix, allowing separation of patients from controls almost perfectly. When the data structure was destroyed by Monte-Carlo random resampling, the U-matrix became unstructured and patients and controls were mixed. Obtained results are biologically plausible and supported by empirical evidence of a regulation of several classes of lipids in Parkinson's disease. Conclusions Sophisticated analysis of structures in biomedical data provides a basis for the mechanistic interpretation of the observations and facilitates subsequent analyses focusing on hypothesis testing. The freely available R library “Umatrix” provides an interactive tool for broader application of unsupervised machine learning on complex biomedical data.
ISSN:2058-6345