Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study
Integrating gene-level data is useful for predicting the role of genes in biological processes. This problem has typically focused on supervised classification, which requires large training sets of positive and negative examples. However, training data sets that are too small for supervised approac...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-05-01
|
Series: | Mathematical and Computational Applications |
Subjects: | |
Online Access: | https://www.mdpi.com/2297-8747/26/2/40 |
id |
doaj-f5ff1288fff2408ca294074f36b51fee |
---|---|
record_format |
Article |
spelling |
doaj-f5ff1288fff2408ca294074f36b51fee2021-06-01T00:23:01ZengMDPI AGMathematical and Computational Applications1300-686X2297-87472021-05-0126404010.3390/mca26020040Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case StudyMichael W. Daniels0Daniel Dvorkin1Rani K. Powers2Katerina Kechris3Department of Bioinformatics and Biostatistics, School of Public Health and Information Sciences, University of Louisville, Louisville, KY 40202, USAThe Bioinformatics CRO, Inc., Niceville, FL 32578, USAWyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02155, USADepartment of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO 80045, USAIntegrating gene-level data is useful for predicting the role of genes in biological processes. This problem has typically focused on supervised classification, which requires large training sets of positive and negative examples. However, training data sets that are too small for supervised approaches can still provide valuable information. We describe a hierarchical mixture model that uses limited positively labeled gene training data for semi-supervised learning. We focus on the problem of predicting essential genes, where a gene is required for the survival of an organism under particular conditions. We applied cross-validation and found that the inclusion of positively labeled samples in a semi-supervised learning framework with the hierarchical mixture model improves the detection of essential genes compared to unsupervised, supervised, and other semi-supervised approaches. There was also improved prediction performance when genes are incorrectly assumed to be non-essential. Our comparisons indicate that the incorporation of even small amounts of existing knowledge improves the accuracy of prediction and decreases variability in predictions. Although we focused on gene essentiality, the hierarchical mixture model and semi-supervised framework is standard for problems focused on prediction of genes or other features, with multiple data types characterizing the feature, and a small set of positive labels.https://www.mdpi.com/2297-8747/26/2/40semi-supervisedhierarchical mixture modelsessential genesgenomicintegration |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Michael W. Daniels Daniel Dvorkin Rani K. Powers Katerina Kechris |
spellingShingle |
Michael W. Daniels Daniel Dvorkin Rani K. Powers Katerina Kechris Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study Mathematical and Computational Applications semi-supervised hierarchical mixture models essential genes genomic integration |
author_facet |
Michael W. Daniels Daniel Dvorkin Rani K. Powers Katerina Kechris |
author_sort |
Michael W. Daniels |
title |
Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study |
title_short |
Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study |
title_full |
Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study |
title_fullStr |
Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study |
title_full_unstemmed |
Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study |
title_sort |
semi-supervised learning using hierarchical mixture models: gene essentiality case study |
publisher |
MDPI AG |
series |
Mathematical and Computational Applications |
issn |
1300-686X 2297-8747 |
publishDate |
2021-05-01 |
description |
Integrating gene-level data is useful for predicting the role of genes in biological processes. This problem has typically focused on supervised classification, which requires large training sets of positive and negative examples. However, training data sets that are too small for supervised approaches can still provide valuable information. We describe a hierarchical mixture model that uses limited positively labeled gene training data for semi-supervised learning. We focus on the problem of predicting essential genes, where a gene is required for the survival of an organism under particular conditions. We applied cross-validation and found that the inclusion of positively labeled samples in a semi-supervised learning framework with the hierarchical mixture model improves the detection of essential genes compared to unsupervised, supervised, and other semi-supervised approaches. There was also improved prediction performance when genes are incorrectly assumed to be non-essential. Our comparisons indicate that the incorporation of even small amounts of existing knowledge improves the accuracy of prediction and decreases variability in predictions. Although we focused on gene essentiality, the hierarchical mixture model and semi-supervised framework is standard for problems focused on prediction of genes or other features, with multiple data types characterizing the feature, and a small set of positive labels. |
topic |
semi-supervised hierarchical mixture models essential genes genomic integration |
url |
https://www.mdpi.com/2297-8747/26/2/40 |
work_keys_str_mv |
AT michaelwdaniels semisupervisedlearningusinghierarchicalmixturemodelsgeneessentialitycasestudy AT danieldvorkin semisupervisedlearningusinghierarchicalmixturemodelsgeneessentialitycasestudy AT ranikpowers semisupervisedlearningusinghierarchicalmixturemodelsgeneessentialitycasestudy AT katerinakechris semisupervisedlearningusinghierarchicalmixturemodelsgeneessentialitycasestudy |
_version_ |
1721415051239227392 |