Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study

Integrating gene-level data is useful for predicting the role of genes in biological processes. This problem has typically focused on supervised classification, which requires large training sets of positive and negative examples. However, training data sets that are too small for supervised approac...

Full description

Bibliographic Details
Main Authors:	Michael W. Daniels, Daniel Dvorkin, Rani K. Powers, Katerina Kechris
Format:	Article
Language:	English
Published:	MDPI AG 2021-05-01
Series:	Mathematical and Computational Applications
Subjects:	semi-supervised hierarchical mixture models essential genes genomic integration
Online Access:	https://www.mdpi.com/2297-8747/26/2/40

id	doaj-f5ff1288fff2408ca294074f36b51fee
record_format	Article
spelling	doaj-f5ff1288fff2408ca294074f36b51fee2021-06-01T00:23:01ZengMDPI AGMathematical and Computational Applications1300-686X2297-87472021-05-0126404010.3390/mca26020040Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case StudyMichael W. Daniels0Daniel Dvorkin1Rani K. Powers2Katerina Kechris3Department of Bioinformatics and Biostatistics, School of Public Health and Information Sciences, University of Louisville, Louisville, KY 40202, USAThe Bioinformatics CRO, Inc., Niceville, FL 32578, USAWyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02155, USADepartment of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO 80045, USAIntegrating gene-level data is useful for predicting the role of genes in biological processes. This problem has typically focused on supervised classification, which requires large training sets of positive and negative examples. However, training data sets that are too small for supervised approaches can still provide valuable information. We describe a hierarchical mixture model that uses limited positively labeled gene training data for semi-supervised learning. We focus on the problem of predicting essential genes, where a gene is required for the survival of an organism under particular conditions. We applied cross-validation and found that the inclusion of positively labeled samples in a semi-supervised learning framework with the hierarchical mixture model improves the detection of essential genes compared to unsupervised, supervised, and other semi-supervised approaches. There was also improved prediction performance when genes are incorrectly assumed to be non-essential. Our comparisons indicate that the incorporation of even small amounts of existing knowledge improves the accuracy of prediction and decreases variability in predictions. Although we focused on gene essentiality, the hierarchical mixture model and semi-supervised framework is standard for problems focused on prediction of genes or other features, with multiple data types characterizing the feature, and a small set of positive labels.https://www.mdpi.com/2297-8747/26/2/40semi-supervisedhierarchical mixture modelsessential genesgenomicintegration
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Michael W. Daniels Daniel Dvorkin Rani K. Powers Katerina Kechris
spellingShingle	Michael W. Daniels Daniel Dvorkin Rani K. Powers Katerina Kechris Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study Mathematical and Computational Applications semi-supervised hierarchical mixture models essential genes genomic integration
author_facet	Michael W. Daniels Daniel Dvorkin Rani K. Powers Katerina Kechris
author_sort	Michael W. Daniels
title	Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study
title_short	Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study
title_full	Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study
title_fullStr	Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study
title_full_unstemmed	Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study
title_sort	semi-supervised learning using hierarchical mixture models: gene essentiality case study
publisher	MDPI AG
series	Mathematical and Computational Applications
issn	1300-686X 2297-8747
publishDate	2021-05-01
description	Integrating gene-level data is useful for predicting the role of genes in biological processes. This problem has typically focused on supervised classification, which requires large training sets of positive and negative examples. However, training data sets that are too small for supervised approaches can still provide valuable information. We describe a hierarchical mixture model that uses limited positively labeled gene training data for semi-supervised learning. We focus on the problem of predicting essential genes, where a gene is required for the survival of an organism under particular conditions. We applied cross-validation and found that the inclusion of positively labeled samples in a semi-supervised learning framework with the hierarchical mixture model improves the detection of essential genes compared to unsupervised, supervised, and other semi-supervised approaches. There was also improved prediction performance when genes are incorrectly assumed to be non-essential. Our comparisons indicate that the incorporation of even small amounts of existing knowledge improves the accuracy of prediction and decreases variability in predictions. Although we focused on gene essentiality, the hierarchical mixture model and semi-supervised framework is standard for problems focused on prediction of genes or other features, with multiple data types characterizing the feature, and a small set of positive labels.
topic	semi-supervised hierarchical mixture models essential genes genomic integration
url	https://www.mdpi.com/2297-8747/26/2/40
work_keys_str_mv	AT michaelwdaniels semisupervisedlearningusinghierarchicalmixturemodelsgeneessentialitycasestudy AT danieldvorkin semisupervisedlearningusinghierarchicalmixturemodelsgeneessentialitycasestudy AT ranikpowers semisupervisedlearningusinghierarchicalmixturemodelsgeneessentialitycasestudy AT katerinakechris semisupervisedlearningusinghierarchicalmixturemodelsgeneessentialitycasestudy
_version_	1721415051239227392

Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study

Similar Items