Summary: | Abstract Background Unsupervised machine-learned analysis of cluster structures, applied using the emergent self-organizing feature maps (ESOM) combined with the unified distance matrix (U-matrix) has been shown to provide an unbiased method to identify true clusters. It outperforms classical hierarchical clustering algorithms that carry a considerable tendency to produce erroneous results. To facilitate the application of the ESOM/U-matrix method in biomedical research, we introduce the interactive R-based bioinformatics tool “Umatrix”, which enables valid identification of a biologically meaningful cluster structure in the data by training a Kohonen-type self-organizing map followed by interface-guided interactive clustering on the emergent U-matrix map. Results The ability to detect clinical relevant subgroups was applied to a data set comprising plasma concentrations of d = 25 lipid markers including endocannabinoids, lysophosphatidic acids, ceramides and sphingolipids acquired from n = 100 patients with Parkinson's disease and n = 100 controls. Following ESOM training, clear data structures in the high-dimensional data space were observed on the U-matrix, allowing separation of patients from controls almost perfectly. When the data structure was destroyed by Monte-Carlo random resampling, the U-matrix became unstructured and patients and controls were mixed. Obtained results are biologically plausible and supported by empirical evidence of a regulation of several classes of lipids in Parkinson's disease. Conclusions Sophisticated analysis of structures in biomedical data provides a basis for the mechanistic interpretation of the observations and facilitates subsequent analyses focusing on hypothesis testing. The freely available R library “Umatrix” provides an interactive tool for broader application of unsupervised machine learning on complex biomedical data.
|