GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies.
Left-censored missing values commonly exist in targeted metabolomics datasets and can be considered as missing not at random (MNAR). Improper data processing procedures for missing values will cause adverse impacts on subsequent statistical analyses. However, few imputation methods have been develop...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2018-01-01
|
Series: | PLoS Computational Biology |
Online Access: | http://europepmc.org/articles/PMC5809088?pdf=render |
id |
doaj-ced149225b9d4c729794aef1e59465c0 |
---|---|
record_format |
Article |
spelling |
doaj-ced149225b9d4c729794aef1e59465c02020-11-25T01:11:55ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582018-01-01141e100597310.1371/journal.pcbi.1005973GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies.Runmin WeiJingye WangErik JiaTianlu ChenYan NiWei JiaLeft-censored missing values commonly exist in targeted metabolomics datasets and can be considered as missing not at random (MNAR). Improper data processing procedures for missing values will cause adverse impacts on subsequent statistical analyses. However, few imputation methods have been developed and applied to the situation of MNAR in the field of metabolomics. Thus, a practical left-censored missing value imputation method is urgently needed. We developed an iterative Gibbs sampler based left-censored missing value imputation approach (GSimp). We compared GSimp with other three imputation methods on two real-world targeted metabolomics datasets and one simulation dataset using our imputation evaluation pipeline. The results show that GSimp outperforms other imputation methods in terms of imputation accuracy, observation distribution, univariate and multivariate analyses, and statistical sensitivity. Additionally, a parallel version of GSimp was developed for dealing with large scale metabolomics datasets. The R code for GSimp, evaluation pipeline, tutorial, real-world and simulated targeted metabolomics datasets are available at: https://github.com/WandeRum/GSimp.http://europepmc.org/articles/PMC5809088?pdf=render |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Runmin Wei Jingye Wang Erik Jia Tianlu Chen Yan Ni Wei Jia |
spellingShingle |
Runmin Wei Jingye Wang Erik Jia Tianlu Chen Yan Ni Wei Jia GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies. PLoS Computational Biology |
author_facet |
Runmin Wei Jingye Wang Erik Jia Tianlu Chen Yan Ni Wei Jia |
author_sort |
Runmin Wei |
title |
GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies. |
title_short |
GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies. |
title_full |
GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies. |
title_fullStr |
GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies. |
title_full_unstemmed |
GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies. |
title_sort |
gsimp: a gibbs sampler based left-censored missing value imputation approach for metabolomics studies. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS Computational Biology |
issn |
1553-734X 1553-7358 |
publishDate |
2018-01-01 |
description |
Left-censored missing values commonly exist in targeted metabolomics datasets and can be considered as missing not at random (MNAR). Improper data processing procedures for missing values will cause adverse impacts on subsequent statistical analyses. However, few imputation methods have been developed and applied to the situation of MNAR in the field of metabolomics. Thus, a practical left-censored missing value imputation method is urgently needed. We developed an iterative Gibbs sampler based left-censored missing value imputation approach (GSimp). We compared GSimp with other three imputation methods on two real-world targeted metabolomics datasets and one simulation dataset using our imputation evaluation pipeline. The results show that GSimp outperforms other imputation methods in terms of imputation accuracy, observation distribution, univariate and multivariate analyses, and statistical sensitivity. Additionally, a parallel version of GSimp was developed for dealing with large scale metabolomics datasets. The R code for GSimp, evaluation pipeline, tutorial, real-world and simulated targeted metabolomics datasets are available at: https://github.com/WandeRum/GSimp. |
url |
http://europepmc.org/articles/PMC5809088?pdf=render |
work_keys_str_mv |
AT runminwei gsimpagibbssamplerbasedleftcensoredmissingvalueimputationapproachformetabolomicsstudies AT jingyewang gsimpagibbssamplerbasedleftcensoredmissingvalueimputationapproachformetabolomicsstudies AT erikjia gsimpagibbssamplerbasedleftcensoredmissingvalueimputationapproachformetabolomicsstudies AT tianluchen gsimpagibbssamplerbasedleftcensoredmissingvalueimputationapproachformetabolomicsstudies AT yanni gsimpagibbssamplerbasedleftcensoredmissingvalueimputationapproachformetabolomicsstudies AT weijia gsimpagibbssamplerbasedleftcensoredmissingvalueimputationapproachformetabolomicsstudies |
_version_ |
1725168858371719168 |