CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis

Abstract Background The clustering of data produced by liquid chromatography coupled to mass spectrometry analyses (LC-MS data) has recently gained interest to extract meaningful chemical or biological patterns. However, recent instrumental pipelines deliver data which size, dimensionality and expec...

Full description

Bibliographic Details
Main Authors:	Olga Permiakova, Romain Guibert, Alexandra Kraut, Thomas Fortin, Anne-Marie Hesse, Thomas Burger
Format:	Article
Language:	English
Published:	BMC 2021-02-01
Series:	BMC Bioinformatics
Subjects:	Large-scale cluster analysis Liquid chromatography Mass spectrometry Proteomics Wasserstein kernel Optimal transport
Online Access:	https://doi.org/10.1186/s12859-021-03969-0

id	doaj-09ef442cca6e4bf58b38e9b9338b97f7
record_format	Article
spelling	doaj-09ef442cca6e4bf58b38e9b9338b97f72021-02-14T12:50:53ZengBMCBMC Bioinformatics1471-21052021-02-0122113010.1186/s12859-021-03969-0CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysisOlga Permiakova0Romain Guibert1Alexandra Kraut2Thomas Fortin3Anne-Marie Hesse4Thomas Burger5Univ. Grenoble Alpes, CEA, Inserm, BGE U1038Univ. Grenoble Alpes, CEA, Inserm, BGE U1038Univ. Grenoble Alpes, CEA, Inserm, BGE U1038Univ. Grenoble Alpes, CEA, Inserm, BGE U1038Univ. Grenoble Alpes, CEA, Inserm, BGE U1038Univ. Grenoble Alpes, CNRS, CEA, Inserm, BGE U1038Abstract Background The clustering of data produced by liquid chromatography coupled to mass spectrometry analyses (LC-MS data) has recently gained interest to extract meaningful chemical or biological patterns. However, recent instrumental pipelines deliver data which size, dimensionality and expected number of clusters are too large to be processed by classical machine learning algorithms, so that most of the state-of-the-art relies on single pass linkage-based algorithms. Results We propose a clustering algorithm that solves the powerful but computationally demanding kernel k-means objective function in a scalable way. As a result, it can process LC-MS data in an acceptable time on a multicore machine. To do so, we combine three essential features: a compressive data representation, Nyström approximation and a hierarchical strategy. In addition, we propose new kernels based on optimal transport, which interprets as intuitive similarity measures between chromatographic elution profiles. Conclusions Our method, referred to as CHICKN, is evaluated on proteomics data produced in our lab, as well as on benchmark data coming from the literature. From a computational viewpoint, it is particularly efficient on raw LC-MS data. From a data analysis viewpoint, it provides clusters which differ from those resulting from state-of-the-art methods, while achieving similar performances. This highlights the complementarity of differently principle algorithms to extract the best from complex LC-MS data.https://doi.org/10.1186/s12859-021-03969-0Large-scale cluster analysisLiquid chromatographyMass spectrometryProteomicsWasserstein kernelOptimal transport
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Olga Permiakova Romain Guibert Alexandra Kraut Thomas Fortin Anne-Marie Hesse Thomas Burger
spellingShingle	Olga Permiakova Romain Guibert Alexandra Kraut Thomas Fortin Anne-Marie Hesse Thomas Burger CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis BMC Bioinformatics Large-scale cluster analysis Liquid chromatography Mass spectrometry Proteomics Wasserstein kernel Optimal transport
author_facet	Olga Permiakova Romain Guibert Alexandra Kraut Thomas Fortin Anne-Marie Hesse Thomas Burger
author_sort	Olga Permiakova
title	CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis
title_short	CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis
title_full	CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis
title_fullStr	CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis
title_full_unstemmed	CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis
title_sort	chickn: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of wasserstein compressive hierarchical cluster analysis
publisher	BMC
series	BMC Bioinformatics
issn	1471-2105
publishDate	2021-02-01
description	Abstract Background The clustering of data produced by liquid chromatography coupled to mass spectrometry analyses (LC-MS data) has recently gained interest to extract meaningful chemical or biological patterns. However, recent instrumental pipelines deliver data which size, dimensionality and expected number of clusters are too large to be processed by classical machine learning algorithms, so that most of the state-of-the-art relies on single pass linkage-based algorithms. Results We propose a clustering algorithm that solves the powerful but computationally demanding kernel k-means objective function in a scalable way. As a result, it can process LC-MS data in an acceptable time on a multicore machine. To do so, we combine three essential features: a compressive data representation, Nyström approximation and a hierarchical strategy. In addition, we propose new kernels based on optimal transport, which interprets as intuitive similarity measures between chromatographic elution profiles. Conclusions Our method, referred to as CHICKN, is evaluated on proteomics data produced in our lab, as well as on benchmark data coming from the literature. From a computational viewpoint, it is particularly efficient on raw LC-MS data. From a data analysis viewpoint, it provides clusters which differ from those resulting from state-of-the-art methods, while achieving similar performances. This highlights the complementarity of differently principle algorithms to extract the best from complex LC-MS data.
topic	Large-scale cluster analysis Liquid chromatography Mass spectrometry Proteomics Wasserstein kernel Optimal transport
url	https://doi.org/10.1186/s12859-021-03969-0
work_keys_str_mv	AT olgapermiakova chicknextractionofpeptidechromatographicelutionprofilesfromlargescalemassspectrometrydatabymeansofwassersteincompressivehierarchicalclusteranalysis AT romainguibert chicknextractionofpeptidechromatographicelutionprofilesfromlargescalemassspectrometrydatabymeansofwassersteincompressivehierarchicalclusteranalysis AT alexandrakraut chicknextractionofpeptidechromatographicelutionprofilesfromlargescalemassspectrometrydatabymeansofwassersteincompressivehierarchicalclusteranalysis AT thomasfortin chicknextractionofpeptidechromatographicelutionprofilesfromlargescalemassspectrometrydatabymeansofwassersteincompressivehierarchicalclusteranalysis AT annemariehesse chicknextractionofpeptidechromatographicelutionprofilesfromlargescalemassspectrometrydatabymeansofwassersteincompressivehierarchicalclusteranalysis AT thomasburger chicknextractionofpeptidechromatographicelutionprofilesfromlargescalemassspectrometrydatabymeansofwassersteincompressivehierarchicalclusteranalysis
_version_	1724269918119002112

CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis

Similar Items