GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies.

Left-censored missing values commonly exist in targeted metabolomics datasets and can be considered as missing not at random (MNAR). Improper data processing procedures for missing values will cause adverse impacts on subsequent statistical analyses. However, few imputation methods have been develop...

Full description

Bibliographic Details
Main Authors: Runmin Wei, Jingye Wang, Erik Jia, Tianlu Chen, Yan Ni, Wei Jia
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2018-01-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC5809088?pdf=render
id doaj-ced149225b9d4c729794aef1e59465c0
record_format Article
spelling doaj-ced149225b9d4c729794aef1e59465c02020-11-25T01:11:55ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582018-01-01141e100597310.1371/journal.pcbi.1005973GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies.Runmin WeiJingye WangErik JiaTianlu ChenYan NiWei JiaLeft-censored missing values commonly exist in targeted metabolomics datasets and can be considered as missing not at random (MNAR). Improper data processing procedures for missing values will cause adverse impacts on subsequent statistical analyses. However, few imputation methods have been developed and applied to the situation of MNAR in the field of metabolomics. Thus, a practical left-censored missing value imputation method is urgently needed. We developed an iterative Gibbs sampler based left-censored missing value imputation approach (GSimp). We compared GSimp with other three imputation methods on two real-world targeted metabolomics datasets and one simulation dataset using our imputation evaluation pipeline. The results show that GSimp outperforms other imputation methods in terms of imputation accuracy, observation distribution, univariate and multivariate analyses, and statistical sensitivity. Additionally, a parallel version of GSimp was developed for dealing with large scale metabolomics datasets. The R code for GSimp, evaluation pipeline, tutorial, real-world and simulated targeted metabolomics datasets are available at: https://github.com/WandeRum/GSimp.http://europepmc.org/articles/PMC5809088?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Runmin Wei
Jingye Wang
Erik Jia
Tianlu Chen
Yan Ni
Wei Jia
spellingShingle Runmin Wei
Jingye Wang
Erik Jia
Tianlu Chen
Yan Ni
Wei Jia
GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies.
PLoS Computational Biology
author_facet Runmin Wei
Jingye Wang
Erik Jia
Tianlu Chen
Yan Ni
Wei Jia
author_sort Runmin Wei
title GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies.
title_short GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies.
title_full GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies.
title_fullStr GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies.
title_full_unstemmed GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies.
title_sort gsimp: a gibbs sampler based left-censored missing value imputation approach for metabolomics studies.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2018-01-01
description Left-censored missing values commonly exist in targeted metabolomics datasets and can be considered as missing not at random (MNAR). Improper data processing procedures for missing values will cause adverse impacts on subsequent statistical analyses. However, few imputation methods have been developed and applied to the situation of MNAR in the field of metabolomics. Thus, a practical left-censored missing value imputation method is urgently needed. We developed an iterative Gibbs sampler based left-censored missing value imputation approach (GSimp). We compared GSimp with other three imputation methods on two real-world targeted metabolomics datasets and one simulation dataset using our imputation evaluation pipeline. The results show that GSimp outperforms other imputation methods in terms of imputation accuracy, observation distribution, univariate and multivariate analyses, and statistical sensitivity. Additionally, a parallel version of GSimp was developed for dealing with large scale metabolomics datasets. The R code for GSimp, evaluation pipeline, tutorial, real-world and simulated targeted metabolomics datasets are available at: https://github.com/WandeRum/GSimp.
url http://europepmc.org/articles/PMC5809088?pdf=render
work_keys_str_mv AT runminwei gsimpagibbssamplerbasedleftcensoredmissingvalueimputationapproachformetabolomicsstudies
AT jingyewang gsimpagibbssamplerbasedleftcensoredmissingvalueimputationapproachformetabolomicsstudies
AT erikjia gsimpagibbssamplerbasedleftcensoredmissingvalueimputationapproachformetabolomicsstudies
AT tianluchen gsimpagibbssamplerbasedleftcensoredmissingvalueimputationapproachformetabolomicsstudies
AT yanni gsimpagibbssamplerbasedleftcensoredmissingvalueimputationapproachformetabolomicsstudies
AT weijia gsimpagibbssamplerbasedleftcensoredmissingvalueimputationapproachformetabolomicsstudies
_version_ 1725168858371719168