Semi-supervised and transductive learning algorithms for predicting alternative splicing events in genes.

Master of Science === Department of Computing and Information Sciences === Doina Caragea === As genomes are sequenced, a major challenge is their annotation -- the identification of genes and regulatory elements, their locations and their functions. For years, it was believed that one gene correspon...

Full description

Bibliographic Details
Main Author: Tangirala, Karthik
Language:en
Published: Kansas State University 2011
Subjects:
Online Access:http://hdl.handle.net/2097/12013
id ndltd-KSU-oai-krex.k-state.edu-2097-12013
record_format oai_dc
spelling ndltd-KSU-oai-krex.k-state.edu-2097-120132017-03-04T03:51:12Z Semi-supervised and transductive learning algorithms for predicting alternative splicing events in genes. Tangirala, Karthik Alternative splicing Co training Semi supervised learning Transductive learning Graph based approach Bioinformatics (0715) Computer Science (0984) Master of Science Department of Computing and Information Sciences Doina Caragea As genomes are sequenced, a major challenge is their annotation -- the identification of genes and regulatory elements, their locations and their functions. For years, it was believed that one gene corresponds to one protein, but the discovery of alternative splicing provided a mechanism for generating different gene transcripts (isoforms) from the same genomic sequence. In the recent years, it has become obvious that a large fraction of genes undergoes alternative splicing. Thus, understanding alternative splicing is a problem of great interest to biologists. Supervised machine learning approaches can be used to predict alternative splicing events at genome level. However, supervised approaches require large amounts of labeled data to produce accurate classifiers. While large amounts of genomic data are produced by the new sequencing technologies, labeling these data can be costly and time consuming. Therefore, semi-supervised learning approaches that can make use of large amounts of unlabeled data, in addition to small amounts of labeled data are highly desirable. In this work, we study the usefulness of a semi-supervised learning approach, co-training, for classifying exons as alternatively spliced or constitutive. The co-training algorithm makes use of two views of the data to iteratively learn two classifiers that can inform each other, at each step, with their best predictions on the unlabeled data. We consider three sets of features for constructing views for the problem of predicting alternatively spliced exons: lengths of the exon of interest and its flanking introns, exonic splicing enhancers (a.k.a., ESE motifs) and intronic regulatory sequences (a.k.a., IRS motifs). Naive Bayes and Support Vector Machine (SVM) algorithms are used as based classifiers in our study. Experimental results show that the usage of the unlabeled data can result in better classifiers as compared to those obtained from the small amount of labeled data alone. In addition to semi-supervised approaches, we also also study the usefulness of graph based transductive learning approaches for predicting alternatively spliced exons. Similar to the semi-supervised learning algorithms, transductive learning algorithms can make use of unlabeled data, together with labeled data, to produce labels for the unlabeled data. However, a classification model that could be used to classify new unlabeled data is not learned in this case. Experimental results show that graph based transductive approaches can make effective use of the unlabeled data. 2011-08-12T13:14:50Z 2011-08-12T13:14:50Z 2011-08-12 2011 August Thesis http://hdl.handle.net/2097/12013 en Kansas State University
collection NDLTD
language en
sources NDLTD
topic Alternative splicing
Co training
Semi supervised learning
Transductive learning
Graph based approach
Bioinformatics (0715)
Computer Science (0984)
spellingShingle Alternative splicing
Co training
Semi supervised learning
Transductive learning
Graph based approach
Bioinformatics (0715)
Computer Science (0984)
Tangirala, Karthik
Semi-supervised and transductive learning algorithms for predicting alternative splicing events in genes.
description Master of Science === Department of Computing and Information Sciences === Doina Caragea === As genomes are sequenced, a major challenge is their annotation -- the identification of genes and regulatory elements, their locations and their functions. For years, it was believed that one gene corresponds to one protein, but the discovery of alternative splicing provided a mechanism for generating different gene transcripts (isoforms) from the same genomic sequence. In the recent years, it has become obvious that a large fraction of genes undergoes alternative splicing. Thus, understanding alternative splicing is a problem of great interest to biologists. Supervised machine learning approaches can be used to predict alternative splicing events at genome level. However, supervised approaches require large amounts of labeled data to produce accurate classifiers. While large amounts of genomic data are produced by the new sequencing technologies, labeling these data can be costly and time consuming. Therefore, semi-supervised learning approaches that can make use of large amounts of unlabeled data, in addition to small amounts of labeled data are highly desirable. In this work, we study the usefulness of a semi-supervised learning approach, co-training, for classifying exons as alternatively spliced or constitutive. The co-training algorithm makes use of two views of the data to iteratively learn two classifiers that can inform each other, at each step, with their best predictions on the unlabeled data. We consider three sets of features for constructing views for the problem of predicting alternatively spliced exons: lengths of the exon of interest and its flanking introns, exonic splicing enhancers (a.k.a., ESE motifs) and intronic regulatory sequences (a.k.a., IRS motifs). Naive Bayes and Support Vector Machine (SVM) algorithms are used as based classifiers in our study. Experimental results show that the usage of the unlabeled data can result in better classifiers as compared to those obtained from the small amount of labeled data alone. In addition to semi-supervised approaches, we also also study the usefulness of graph based transductive learning approaches for predicting alternatively spliced exons. Similar to the semi-supervised learning algorithms, transductive learning algorithms can make use of unlabeled data, together with labeled data, to produce labels for the unlabeled data. However, a classification model that could be used to classify new unlabeled data is not learned in this case. Experimental results show that graph based transductive approaches can make effective use of the unlabeled data.
author Tangirala, Karthik
author_facet Tangirala, Karthik
author_sort Tangirala, Karthik
title Semi-supervised and transductive learning algorithms for predicting alternative splicing events in genes.
title_short Semi-supervised and transductive learning algorithms for predicting alternative splicing events in genes.
title_full Semi-supervised and transductive learning algorithms for predicting alternative splicing events in genes.
title_fullStr Semi-supervised and transductive learning algorithms for predicting alternative splicing events in genes.
title_full_unstemmed Semi-supervised and transductive learning algorithms for predicting alternative splicing events in genes.
title_sort semi-supervised and transductive learning algorithms for predicting alternative splicing events in genes.
publisher Kansas State University
publishDate 2011
url http://hdl.handle.net/2097/12013
work_keys_str_mv AT tangiralakarthik semisupervisedandtransductivelearningalgorithmsforpredictingalternativesplicingeventsingenes
_version_ 1718418807430578176