Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study

Integrating gene-level data is useful for predicting the role of genes in biological processes. This problem has typically focused on supervised classification, which requires large training sets of positive and negative examples. However, training data sets that are too small for supervised approac...

Full description

Bibliographic Details
Main Authors: Michael W. Daniels, Daniel Dvorkin, Rani K. Powers, Katerina Kechris
Format: Article
Language:English
Published: MDPI AG 2021-05-01
Series:Mathematical and Computational Applications
Subjects:
Online Access:https://www.mdpi.com/2297-8747/26/2/40
id doaj-f5ff1288fff2408ca294074f36b51fee
record_format Article
spelling doaj-f5ff1288fff2408ca294074f36b51fee2021-06-01T00:23:01ZengMDPI AGMathematical and Computational Applications1300-686X2297-87472021-05-0126404010.3390/mca26020040Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case StudyMichael W. Daniels0Daniel Dvorkin1Rani K. Powers2Katerina Kechris3Department of Bioinformatics and Biostatistics, School of Public Health and Information Sciences, University of Louisville, Louisville, KY 40202, USAThe Bioinformatics CRO, Inc., Niceville, FL 32578, USAWyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02155, USADepartment of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO 80045, USAIntegrating gene-level data is useful for predicting the role of genes in biological processes. This problem has typically focused on supervised classification, which requires large training sets of positive and negative examples. However, training data sets that are too small for supervised approaches can still provide valuable information. We describe a hierarchical mixture model that uses limited positively labeled gene training data for semi-supervised learning. We focus on the problem of predicting essential genes, where a gene is required for the survival of an organism under particular conditions. We applied cross-validation and found that the inclusion of positively labeled samples in a semi-supervised learning framework with the hierarchical mixture model improves the detection of essential genes compared to unsupervised, supervised, and other semi-supervised approaches. There was also improved prediction performance when genes are incorrectly assumed to be non-essential. Our comparisons indicate that the incorporation of even small amounts of existing knowledge improves the accuracy of prediction and decreases variability in predictions. Although we focused on gene essentiality, the hierarchical mixture model and semi-supervised framework is standard for problems focused on prediction of genes or other features, with multiple data types characterizing the feature, and a small set of positive labels.https://www.mdpi.com/2297-8747/26/2/40semi-supervisedhierarchical mixture modelsessential genesgenomicintegration
collection DOAJ
language English
format Article
sources DOAJ
author Michael W. Daniels
Daniel Dvorkin
Rani K. Powers
Katerina Kechris
spellingShingle Michael W. Daniels
Daniel Dvorkin
Rani K. Powers
Katerina Kechris
Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study
Mathematical and Computational Applications
semi-supervised
hierarchical mixture models
essential genes
genomic
integration
author_facet Michael W. Daniels
Daniel Dvorkin
Rani K. Powers
Katerina Kechris
author_sort Michael W. Daniels
title Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study
title_short Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study
title_full Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study
title_fullStr Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study
title_full_unstemmed Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study
title_sort semi-supervised learning using hierarchical mixture models: gene essentiality case study
publisher MDPI AG
series Mathematical and Computational Applications
issn 1300-686X
2297-8747
publishDate 2021-05-01
description Integrating gene-level data is useful for predicting the role of genes in biological processes. This problem has typically focused on supervised classification, which requires large training sets of positive and negative examples. However, training data sets that are too small for supervised approaches can still provide valuable information. We describe a hierarchical mixture model that uses limited positively labeled gene training data for semi-supervised learning. We focus on the problem of predicting essential genes, where a gene is required for the survival of an organism under particular conditions. We applied cross-validation and found that the inclusion of positively labeled samples in a semi-supervised learning framework with the hierarchical mixture model improves the detection of essential genes compared to unsupervised, supervised, and other semi-supervised approaches. There was also improved prediction performance when genes are incorrectly assumed to be non-essential. Our comparisons indicate that the incorporation of even small amounts of existing knowledge improves the accuracy of prediction and decreases variability in predictions. Although we focused on gene essentiality, the hierarchical mixture model and semi-supervised framework is standard for problems focused on prediction of genes or other features, with multiple data types characterizing the feature, and a small set of positive labels.
topic semi-supervised
hierarchical mixture models
essential genes
genomic
integration
url https://www.mdpi.com/2297-8747/26/2/40
work_keys_str_mv AT michaelwdaniels semisupervisedlearningusinghierarchicalmixturemodelsgeneessentialitycasestudy
AT danieldvorkin semisupervisedlearningusinghierarchicalmixturemodelsgeneessentialitycasestudy
AT ranikpowers semisupervisedlearningusinghierarchicalmixturemodelsgeneessentialitycasestudy
AT katerinakechris semisupervisedlearningusinghierarchicalmixturemodelsgeneessentialitycasestudy
_version_ 1721415051239227392