Prediction of gene expression with cis-SNPs using mixed models and regularization methods

Abstract Background It has been shown that gene expression in human tissues is heritable, thus predicting gene expression using only SNPs becomes possible. The prediction of gene expression can offer important implications on the genetic architecture of individual functional associated SNPs and furt...

Full description

Bibliographic Details
Main Authors: Ping Zeng, Xiang Zhou, Shuiping Huang
Format: Article
Language:English
Published: BMC 2017-05-01
Series:BMC Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12864-017-3759-6
id doaj-deac1b7b8c3e4b9a916420e60843124b
record_format Article
spelling doaj-deac1b7b8c3e4b9a916420e60843124b2020-11-25T00:26:06ZengBMCBMC Genomics1471-21642017-05-0118111110.1186/s12864-017-3759-6Prediction of gene expression with cis-SNPs using mixed models and regularization methodsPing Zeng0Xiang Zhou1Shuiping Huang2Department of Epidemiology and Biostatistics, Xuzhou Medical UniversityDepartment of Biostatistics, University of MichiganDepartment of Epidemiology and Biostatistics, Xuzhou Medical UniversityAbstract Background It has been shown that gene expression in human tissues is heritable, thus predicting gene expression using only SNPs becomes possible. The prediction of gene expression can offer important implications on the genetic architecture of individual functional associated SNPs and further interpretations of the molecular basis underlying human diseases. Methods We compared three types of methods for predicting gene expression using only cis-SNPs, including the polygenic model, i.e. linear mixed model (LMM), two sparse models, i.e. Lasso and elastic net (ENET), and the hybrid of LMM and sparse model, i.e. Bayesian sparse linear mixed model (BSLMM). The three kinds of prediction methods have very different assumptions of underlying genetic architectures. These methods were evaluated using simulations under various scenarios, and were applied to the Geuvadis gene expression data. Results The simulations showed that these four prediction methods (i.e. Lasso, ENET, LMM and BSLMM) behaved best when their respective modeling assumptions were satisfied, but BSLMM had a robust performance across a range of scenarios. According to R 2 of these models in the Geuvadis data, the four methods performed quite similarly. We did not observe any clustering or enrichment of predictive genes (defined as genes with R 2 ≥ 0.05) across the chromosomes, and also did not see there was any clear relationship between the proportion of the predictive genes and the proportion of genes in each chromosome. However, an interesting finding in the Geuvadis data was that highly predictive genes (e.g. R 2 ≥ 0.30) may have sparse genetic architectures since Lasso, ENET and BSLMM outperformed LMM for these genes; and this observation was validated in another gene expression data. We further showed that the predictive genes were enriched in approximately independent LD blocks. Conclusions Gene expression can be predicted with only cis-SNPs using well-developed prediction models and these predictive genes were enriched in some approximately independent LD blocks. The prediction of gene expression can shed some light on the functional interpretation for identified SNPs in GWASs.http://link.springer.com/article/10.1186/s12864-017-3759-6Gene expressionCis-SNPsPrediction modelLinear mixed modelLassoElastic net
collection DOAJ
language English
format Article
sources DOAJ
author Ping Zeng
Xiang Zhou
Shuiping Huang
spellingShingle Ping Zeng
Xiang Zhou
Shuiping Huang
Prediction of gene expression with cis-SNPs using mixed models and regularization methods
BMC Genomics
Gene expression
Cis-SNPs
Prediction model
Linear mixed model
Lasso
Elastic net
author_facet Ping Zeng
Xiang Zhou
Shuiping Huang
author_sort Ping Zeng
title Prediction of gene expression with cis-SNPs using mixed models and regularization methods
title_short Prediction of gene expression with cis-SNPs using mixed models and regularization methods
title_full Prediction of gene expression with cis-SNPs using mixed models and regularization methods
title_fullStr Prediction of gene expression with cis-SNPs using mixed models and regularization methods
title_full_unstemmed Prediction of gene expression with cis-SNPs using mixed models and regularization methods
title_sort prediction of gene expression with cis-snps using mixed models and regularization methods
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2017-05-01
description Abstract Background It has been shown that gene expression in human tissues is heritable, thus predicting gene expression using only SNPs becomes possible. The prediction of gene expression can offer important implications on the genetic architecture of individual functional associated SNPs and further interpretations of the molecular basis underlying human diseases. Methods We compared three types of methods for predicting gene expression using only cis-SNPs, including the polygenic model, i.e. linear mixed model (LMM), two sparse models, i.e. Lasso and elastic net (ENET), and the hybrid of LMM and sparse model, i.e. Bayesian sparse linear mixed model (BSLMM). The three kinds of prediction methods have very different assumptions of underlying genetic architectures. These methods were evaluated using simulations under various scenarios, and were applied to the Geuvadis gene expression data. Results The simulations showed that these four prediction methods (i.e. Lasso, ENET, LMM and BSLMM) behaved best when their respective modeling assumptions were satisfied, but BSLMM had a robust performance across a range of scenarios. According to R 2 of these models in the Geuvadis data, the four methods performed quite similarly. We did not observe any clustering or enrichment of predictive genes (defined as genes with R 2 ≥ 0.05) across the chromosomes, and also did not see there was any clear relationship between the proportion of the predictive genes and the proportion of genes in each chromosome. However, an interesting finding in the Geuvadis data was that highly predictive genes (e.g. R 2 ≥ 0.30) may have sparse genetic architectures since Lasso, ENET and BSLMM outperformed LMM for these genes; and this observation was validated in another gene expression data. We further showed that the predictive genes were enriched in approximately independent LD blocks. Conclusions Gene expression can be predicted with only cis-SNPs using well-developed prediction models and these predictive genes were enriched in some approximately independent LD blocks. The prediction of gene expression can shed some light on the functional interpretation for identified SNPs in GWASs.
topic Gene expression
Cis-SNPs
Prediction model
Linear mixed model
Lasso
Elastic net
url http://link.springer.com/article/10.1186/s12864-017-3759-6
work_keys_str_mv AT pingzeng predictionofgeneexpressionwithcissnpsusingmixedmodelsandregularizationmethods
AT xiangzhou predictionofgeneexpressionwithcissnpsusingmixedmodelsandregularizationmethods
AT shuipinghuang predictionofgeneexpressionwithcissnpsusingmixedmodelsandregularizationmethods
_version_ 1725345991600635904