A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data

Abstract Background Identifying differentially expressed genes between the same or different species is an urgent demand for biological and medical research. For RNA-seq data, systematic technical effects and different sequencing depths are usually encountered when conducting experiments. Normalizat...

Full description

Bibliographic Details
Main Authors: Yan Zhou, Bin Yang, Junhui Wang, Jiadi Zhu, Guoliang Tian
Format: Article
Language:English
Published: BMC 2021-06-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-021-07790-0
id doaj-0179749da88a4cd480f6f14692317e68
record_format Article
spelling doaj-0179749da88a4cd480f6f14692317e682021-06-27T11:22:13ZengBMCBMC Genomics1471-21642021-06-0122111410.1186/s12864-021-07790-0A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq dataYan Zhou0Bin Yang1Junhui Wang2Jiadi Zhu3Guoliang Tian4College of Mathematics and Statistics, Institute of Statistical Sciences, Shenzhen Key Laboratory of Advanced Machine Learning and Applications, Shenzhen UniversityCollege of Mathematics and Statistics, Institute of Statistical Sciences, Shenzhen Key Laboratory of Advanced Machine Learning and Applications, Shenzhen UniversitySchool of Data ScienceCollege of Mathematics and Statistics, Institute of Statistical Sciences, Shenzhen Key Laboratory of Advanced Machine Learning and Applications, Shenzhen UniversityDepartment of Statistics and Data Science, Southern University of Science and TechnologyAbstract Background Identifying differentially expressed genes between the same or different species is an urgent demand for biological and medical research. For RNA-seq data, systematic technical effects and different sequencing depths are usually encountered when conducting experiments. Normalization is regarded as an essential step in the discovery of biologically important changes in expression. The present methods usually involve normalization of the data with a scaling factor, followed by detection of significant genes. However, more than one scaling factor may exist because of the complexity of real data. Consequently, methods that normalize data by a single scaling factor may deliver suboptimal performance or may not even work.The development of modern machine learning techniques has provided a new perspective regarding discrimination between differentially expressed (DE) and non-DE genes. However, in reality, the non-DE genes comprise only a small set and may contain housekeeping genes (in same species) or conserved orthologous genes (in different species). Therefore, the process of detecting DE genes can be formulated as a one-class classification problem, where only non-DE genes are observed, while DE genes are completely absent from the training data. Results In this study, we transform the problem to an outlier detection problem by treating DE genes as outliers, and we propose a scaling-free minimum enclosing ball (SFMEB) method to construct a smallest possible ball to contain the known non-DE genes in a feature space. The genes outside the minimum enclosing ball can then be naturally considered to be DE genes. Compared with the existing methods, the proposed SFMEB method does not require data normalization, which is particularly attractive when the RNA-seq data include more than one scaling factor. Furthermore, the SFMEB method could be easily extended to different species without normalization. Conclusions Simulation studies demonstrate that the SFMEB method works well in a wide range of settings, especially when the data are heterogeneous or biological replicates. Analysis of the real data also supports the conclusion that the SFMEB method outperforms other existing competitors. The R package of the proposed method is available at https://bioconductor.org/packages/MEB .https://doi.org/10.1186/s12864-021-07790-0Minimum enclosing ballDifferentially expressed genesRNA-seq data
collection DOAJ
language English
format Article
sources DOAJ
author Yan Zhou
Bin Yang
Junhui Wang
Jiadi Zhu
Guoliang Tian
spellingShingle Yan Zhou
Bin Yang
Junhui Wang
Jiadi Zhu
Guoliang Tian
A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data
BMC Genomics
Minimum enclosing ball
Differentially expressed genes
RNA-seq data
author_facet Yan Zhou
Bin Yang
Junhui Wang
Jiadi Zhu
Guoliang Tian
author_sort Yan Zhou
title A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data
title_short A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data
title_full A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data
title_fullStr A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data
title_full_unstemmed A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data
title_sort scaling-free minimum enclosing ball method to detect differentially expressed genes for rna-seq data
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2021-06-01
description Abstract Background Identifying differentially expressed genes between the same or different species is an urgent demand for biological and medical research. For RNA-seq data, systematic technical effects and different sequencing depths are usually encountered when conducting experiments. Normalization is regarded as an essential step in the discovery of biologically important changes in expression. The present methods usually involve normalization of the data with a scaling factor, followed by detection of significant genes. However, more than one scaling factor may exist because of the complexity of real data. Consequently, methods that normalize data by a single scaling factor may deliver suboptimal performance or may not even work.The development of modern machine learning techniques has provided a new perspective regarding discrimination between differentially expressed (DE) and non-DE genes. However, in reality, the non-DE genes comprise only a small set and may contain housekeeping genes (in same species) or conserved orthologous genes (in different species). Therefore, the process of detecting DE genes can be formulated as a one-class classification problem, where only non-DE genes are observed, while DE genes are completely absent from the training data. Results In this study, we transform the problem to an outlier detection problem by treating DE genes as outliers, and we propose a scaling-free minimum enclosing ball (SFMEB) method to construct a smallest possible ball to contain the known non-DE genes in a feature space. The genes outside the minimum enclosing ball can then be naturally considered to be DE genes. Compared with the existing methods, the proposed SFMEB method does not require data normalization, which is particularly attractive when the RNA-seq data include more than one scaling factor. Furthermore, the SFMEB method could be easily extended to different species without normalization. Conclusions Simulation studies demonstrate that the SFMEB method works well in a wide range of settings, especially when the data are heterogeneous or biological replicates. Analysis of the real data also supports the conclusion that the SFMEB method outperforms other existing competitors. The R package of the proposed method is available at https://bioconductor.org/packages/MEB .
topic Minimum enclosing ball
Differentially expressed genes
RNA-seq data
url https://doi.org/10.1186/s12864-021-07790-0
work_keys_str_mv AT yanzhou ascalingfreeminimumenclosingballmethodtodetectdifferentiallyexpressedgenesforrnaseqdata
AT binyang ascalingfreeminimumenclosingballmethodtodetectdifferentiallyexpressedgenesforrnaseqdata
AT junhuiwang ascalingfreeminimumenclosingballmethodtodetectdifferentiallyexpressedgenesforrnaseqdata
AT jiadizhu ascalingfreeminimumenclosingballmethodtodetectdifferentiallyexpressedgenesforrnaseqdata
AT guoliangtian ascalingfreeminimumenclosingballmethodtodetectdifferentiallyexpressedgenesforrnaseqdata
AT yanzhou scalingfreeminimumenclosingballmethodtodetectdifferentiallyexpressedgenesforrnaseqdata
AT binyang scalingfreeminimumenclosingballmethodtodetectdifferentiallyexpressedgenesforrnaseqdata
AT junhuiwang scalingfreeminimumenclosingballmethodtodetectdifferentiallyexpressedgenesforrnaseqdata
AT jiadizhu scalingfreeminimumenclosingballmethodtodetectdifferentiallyexpressedgenesforrnaseqdata
AT guoliangtian scalingfreeminimumenclosingballmethodtodetectdifferentiallyexpressedgenesforrnaseqdata
_version_ 1721357839208808448