BALLI: Bartlett-adjusted likelihood-based linear model approach for identifying differentially expressed genes with RNA-seq data

Abstract Background Transcriptomic profiles can improve our understanding of the phenotypic molecular basis of biological research, and many statistical methods have been proposed to identify differentially expressed genes (DEGs) under two or more conditions with RNA-seq data. However, statistical a...

Full description

Bibliographic Details
Main Authors: Kyungtaek Park, Jaehoon An, Jungsoo Gim, Minseok Seo, Woojoo Lee, Taesung Park, Sungho Won
Format: Article
Language:English
Published: BMC 2019-07-01
Series:BMC Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12864-019-5851-6
id doaj-6be7bce3fd3f484c812eb7b8af5ca8fd
record_format Article
spelling doaj-6be7bce3fd3f484c812eb7b8af5ca8fd2020-11-25T03:07:32ZengBMCBMC Genomics1471-21642019-07-0120111410.1186/s12864-019-5851-6BALLI: Bartlett-adjusted likelihood-based linear model approach for identifying differentially expressed genes with RNA-seq dataKyungtaek Park0Jaehoon An1Jungsoo Gim2Minseok Seo3Woojoo Lee4Taesung Park5Sungho Won6Interdisciplinary Program of Bioinformatics, Seoul National UniversityDepartment of Public Health Science, Seoul National UniversityDepartment of Biomedical Science, Chosun UniversityChanning Division of Network Medicine, Brigham and Women’s HospitalDepartment of Statistics, Inha UniversityInterdisciplinary Program of Bioinformatics, Seoul National UniversityInterdisciplinary Program of Bioinformatics, Seoul National UniversityAbstract Background Transcriptomic profiles can improve our understanding of the phenotypic molecular basis of biological research, and many statistical methods have been proposed to identify differentially expressed genes (DEGs) under two or more conditions with RNA-seq data. However, statistical analyses with RNA-seq data are often limited by small sample sizes, and global variance estimates of RNA expression levels have been utilized as prior distributions for gene-specific variance estimates, making it difficult to generalize the methods to more complicated settings. We herein proposed a Bartlett-Adjusted Likelihood-based LInear mixed model approach (BALLI) to analyze more complicated RNA-seq data. The proposed method estimates the technical and biological variances with a linear mixed-effects model, with and without adjusting small sample bias using Bartlkett’s corrections. Results We conducted extensive simulations to compare the performance of BALLI with those of existing approaches (edgeR, DESeq2, and voom). Results from the simulation studies showed that BALLI correctly controlled the type-1 error rates at various nominal significance levels and produced better statistical power and precision estimates than those of other competing methods in various scenarios. Furthermore, BALLI was robust to variation of library size. It was also successfully applied to Holstein milk yield data, illustrating its practical value. Conclusions; BALLI is statistically more efficient and valid than existing methods, and we conclude that it is useful for identifying DEGs in RNA-seq analysis.http://link.springer.com/article/10.1186/s12864-019-5851-6Differentially expressed genesRNA sequencingLinear mixed modelBartlett’s correction
collection DOAJ
language English
format Article
sources DOAJ
author Kyungtaek Park
Jaehoon An
Jungsoo Gim
Minseok Seo
Woojoo Lee
Taesung Park
Sungho Won
spellingShingle Kyungtaek Park
Jaehoon An
Jungsoo Gim
Minseok Seo
Woojoo Lee
Taesung Park
Sungho Won
BALLI: Bartlett-adjusted likelihood-based linear model approach for identifying differentially expressed genes with RNA-seq data
BMC Genomics
Differentially expressed genes
RNA sequencing
Linear mixed model
Bartlett’s correction
author_facet Kyungtaek Park
Jaehoon An
Jungsoo Gim
Minseok Seo
Woojoo Lee
Taesung Park
Sungho Won
author_sort Kyungtaek Park
title BALLI: Bartlett-adjusted likelihood-based linear model approach for identifying differentially expressed genes with RNA-seq data
title_short BALLI: Bartlett-adjusted likelihood-based linear model approach for identifying differentially expressed genes with RNA-seq data
title_full BALLI: Bartlett-adjusted likelihood-based linear model approach for identifying differentially expressed genes with RNA-seq data
title_fullStr BALLI: Bartlett-adjusted likelihood-based linear model approach for identifying differentially expressed genes with RNA-seq data
title_full_unstemmed BALLI: Bartlett-adjusted likelihood-based linear model approach for identifying differentially expressed genes with RNA-seq data
title_sort balli: bartlett-adjusted likelihood-based linear model approach for identifying differentially expressed genes with rna-seq data
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2019-07-01
description Abstract Background Transcriptomic profiles can improve our understanding of the phenotypic molecular basis of biological research, and many statistical methods have been proposed to identify differentially expressed genes (DEGs) under two or more conditions with RNA-seq data. However, statistical analyses with RNA-seq data are often limited by small sample sizes, and global variance estimates of RNA expression levels have been utilized as prior distributions for gene-specific variance estimates, making it difficult to generalize the methods to more complicated settings. We herein proposed a Bartlett-Adjusted Likelihood-based LInear mixed model approach (BALLI) to analyze more complicated RNA-seq data. The proposed method estimates the technical and biological variances with a linear mixed-effects model, with and without adjusting small sample bias using Bartlkett’s corrections. Results We conducted extensive simulations to compare the performance of BALLI with those of existing approaches (edgeR, DESeq2, and voom). Results from the simulation studies showed that BALLI correctly controlled the type-1 error rates at various nominal significance levels and produced better statistical power and precision estimates than those of other competing methods in various scenarios. Furthermore, BALLI was robust to variation of library size. It was also successfully applied to Holstein milk yield data, illustrating its practical value. Conclusions; BALLI is statistically more efficient and valid than existing methods, and we conclude that it is useful for identifying DEGs in RNA-seq analysis.
topic Differentially expressed genes
RNA sequencing
Linear mixed model
Bartlett’s correction
url http://link.springer.com/article/10.1186/s12864-019-5851-6
work_keys_str_mv AT kyungtaekpark ballibartlettadjustedlikelihoodbasedlinearmodelapproachforidentifyingdifferentiallyexpressedgeneswithrnaseqdata
AT jaehoonan ballibartlettadjustedlikelihoodbasedlinearmodelapproachforidentifyingdifferentiallyexpressedgeneswithrnaseqdata
AT jungsoogim ballibartlettadjustedlikelihoodbasedlinearmodelapproachforidentifyingdifferentiallyexpressedgeneswithrnaseqdata
AT minseokseo ballibartlettadjustedlikelihoodbasedlinearmodelapproachforidentifyingdifferentiallyexpressedgeneswithrnaseqdata
AT woojoolee ballibartlettadjustedlikelihoodbasedlinearmodelapproachforidentifyingdifferentiallyexpressedgeneswithrnaseqdata
AT taesungpark ballibartlettadjustedlikelihoodbasedlinearmodelapproachforidentifyingdifferentiallyexpressedgeneswithrnaseqdata
AT sunghowon ballibartlettadjustedlikelihoodbasedlinearmodelapproachforidentifyingdifferentiallyexpressedgeneswithrnaseqdata
_version_ 1724669950118854656