CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis

Background: The clustering of data produced by liquid chromatography coupled to mass spectrometry analyses (LC-MS data) has recently gained interest to extract meaningful chemical or biological patterns. However, recent instrumental pipelines deliver data which size, dimensionality and expected numb...

Full description

Bibliographic Details
Main Authors:	Burger, T. (Author), Fortin, T. (Author), Guibert, R. (Author), Hesse, A.-M (Author), Kraut, A. (Author), Permiakova, O. (Author)
Format:	Article
Language:	English
Published:	BioMed Central Ltd 2021
Subjects:	algorithm Algorithms chemistry Chromatography, Liquid cluster analysis Cluster Analysis Data Compression Data mining Data representations Essential features Hierarchical clustering Hierarchical strategies Hierarchical systems information processing K-means clustering Large-scale cluster analysis Learning algorithms liquid chromatography Liquid chromatography Machine learning mass spectrometry Mass spectrometry Mass Spectrometry Mass spectrometry analysis Mass spectrometry data Multi-core machines Objective functions Optimal transport peptide Peptides procedures proteomics Proteomics State-of-the-art methods Wasserstein kernel
Online Access:	View Fulltext in Publisher


LEADER	03584nam a2200661Ia 4500
001	10.1186-s12859-021-03969-0
008	220427s2021 CNT 000 0 und d
020			\|a 14712105 (ISSN)
245	1	0	\|a CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis
260		0	\|b BioMed Central Ltd \|c 2021
856			\|z View Fulltext in Publisher \|u https://doi.org/10.1186/s12859-021-03969-0
520	3		\|a Background: The clustering of data produced by liquid chromatography coupled to mass spectrometry analyses (LC-MS data) has recently gained interest to extract meaningful chemical or biological patterns. However, recent instrumental pipelines deliver data which size, dimensionality and expected number of clusters are too large to be processed by classical machine learning algorithms, so that most of the state-of-the-art relies on single pass linkage-based algorithms. Results: We propose a clustering algorithm that solves the powerful but computationally demanding kernel k-means objective function in a scalable way. As a result, it can process LC-MS data in an acceptable time on a multicore machine. To do so, we combine three essential features: a compressive data representation, Nyström approximation and a hierarchical strategy. In addition, we propose new kernels based on optimal transport, which interprets as intuitive similarity measures between chromatographic elution profiles. Conclusions: Our method, referred to as CHICKN, is evaluated on proteomics data produced in our lab, as well as on benchmark data coming from the literature. From a computational viewpoint, it is particularly efficient on raw LC-MS data. From a data analysis viewpoint, it provides clusters which differ from those resulting from state-of-the-art methods, while achieving similar performances. This highlights the complementarity of differently principle algorithms to extract the best from complex LC-MS data. © 2021, The Author(s).
650	0	4	\|a algorithm
650	0	4	\|a Algorithms
650	0	4	\|a chemistry
650	0	4	\|a Chromatography, Liquid
650	0	4	\|a cluster analysis
650	0	4	\|a Cluster Analysis
650	0	4	\|a Data Compression
650	0	4	\|a Data mining
650	0	4	\|a Data representations
650	0	4	\|a Essential features
650	0	4	\|a Hierarchical clustering
650	0	4	\|a Hierarchical strategies
650	0	4	\|a Hierarchical systems
650	0	4	\|a information processing
650	0	4	\|a K-means clustering
650	0	4	\|a Large-scale cluster analysis
650	0	4	\|a Learning algorithms
650	0	4	\|a liquid chromatography
650	0	4	\|a Liquid chromatography
650	0	4	\|a Liquid chromatography
650	0	4	\|a Machine learning
650	0	4	\|a mass spectrometry
650	0	4	\|a Mass spectrometry
650	0	4	\|a Mass spectrometry
650	0	4	\|a Mass Spectrometry
650	0	4	\|a Mass spectrometry analysis
650	0	4	\|a Mass spectrometry data
650	0	4	\|a Multi-core machines
650	0	4	\|a Objective functions
650	0	4	\|a Optimal transport
650	0	4	\|a peptide
650	0	4	\|a Peptides
650	0	4	\|a procedures
650	0	4	\|a proteomics
650	0	4	\|a Proteomics
650	0	4	\|a Proteomics
650	0	4	\|a Proteomics
650	0	4	\|a State-of-the-art methods
650	0	4	\|a Wasserstein kernel
700	1		\|a Burger, T. \|e author
700	1		\|a Fortin, T. \|e author
700	1		\|a Guibert, R. \|e author
700	1		\|a Hesse, A.-M. \|e author
700	1		\|a Kraut, A. \|e author
700	1		\|a Permiakova, O. \|e author
773			\|t BMC Bioinformatics

CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis

Similar Items