Efficient methods for improving the sensitivity and accuracy of RNA alignments and structure prediction

RNA plays an important role in molecular biology. RNA sequence comparison is an important method to analysis the gene expression. Since aligning RNA reads needs to handle gaps, mutations, poly-A tails, etc. It is much more difficult than aligning other sequences. In this thesis, we study the RNA-Seq...

Full description

Bibliographic Details
Main Authors: Li, Yaoman, 李耀满
Other Authors: Ting, HF
Language:English
Published: The University of Hong Kong (Pokfulam, Hong Kong) 2014
Subjects:
Online Access:http://hdl.handle.net/10722/195977
id ndltd-HKU-oai-hub.hku.hk-10722-195977
record_format oai_dc
spelling ndltd-HKU-oai-hub.hku.hk-10722-1959772015-07-29T04:02:29Z Efficient methods for improving the sensitivity and accuracy of RNA alignments and structure prediction Li, Yaoman 李耀满 Ting, HF Nucleotide sequence - Data processing RNA plays an important role in molecular biology. RNA sequence comparison is an important method to analysis the gene expression. Since aligning RNA reads needs to handle gaps, mutations, poly-A tails, etc. It is much more difficult than aligning other sequences. In this thesis, we study the RNA-Seq align tools, the existing gene information database and how to improve the accuracy of alignment and predict RNA secondary structure. The known gene information database contains a lot of reliable gene information that has been discovered. And we note most DNA align tools are well developed. They can run much faster than existing RNA-Seq align tools and have higher sensitivity and accuracy. Combining with the known gene information database, we present a method to align RNA-Seq data by using DNA align tools. I.e. we use the DNA align tools to do alignment and use the gene information to convert the alignment to genome based. The gene information database, though updated daily, there are still a lot of genes and alternative splicings that hadn't been discovered. If our RNA align tool only relies on the known gene database, then there may be a lot reads that come from unknown gene or alternative splicing cannot be aligned. Thus, we show a combinational method that can cover potential alternative splicing junction sites. Combining with the original gene database, the new align tools can cover most alignments which are reported by other RNA-Seq align tools. Recently a lot of RNA-Seq align tools have been developed. They are more powerful and faster than the old generation tools. However, the RNA read alignment is much more complicated than other sequence alignment. The alignments reported by some RNA-Seq align tools have low accuracy. We present a simple and efficient filter method based on the quality score of the reads. It can filter most low accuracy alignments. At last, we present a RNA secondary prediction method that can predict pseudoknot(a type of RNA secondary structure) with high sensitivity and specificity. published_or_final_version Computer Science Master Master of Philosophy 2014-03-21T03:50:02Z 2014-03-21T03:50:02Z 2013 PG_Thesis 10.5353/th_b5153733 b5153733 http://hdl.handle.net/10722/195977 eng HKU Theses Online (HKUTO) The author retains all proprietary rights, (such as patent rights) and the right to use in future works. Creative Commons: Attribution 3.0 Hong Kong License The University of Hong Kong (Pokfulam, Hong Kong)
collection NDLTD
language English
sources NDLTD
topic Nucleotide sequence - Data processing
spellingShingle Nucleotide sequence - Data processing
Li, Yaoman
李耀满
Efficient methods for improving the sensitivity and accuracy of RNA alignments and structure prediction
description RNA plays an important role in molecular biology. RNA sequence comparison is an important method to analysis the gene expression. Since aligning RNA reads needs to handle gaps, mutations, poly-A tails, etc. It is much more difficult than aligning other sequences. In this thesis, we study the RNA-Seq align tools, the existing gene information database and how to improve the accuracy of alignment and predict RNA secondary structure. The known gene information database contains a lot of reliable gene information that has been discovered. And we note most DNA align tools are well developed. They can run much faster than existing RNA-Seq align tools and have higher sensitivity and accuracy. Combining with the known gene information database, we present a method to align RNA-Seq data by using DNA align tools. I.e. we use the DNA align tools to do alignment and use the gene information to convert the alignment to genome based. The gene information database, though updated daily, there are still a lot of genes and alternative splicings that hadn't been discovered. If our RNA align tool only relies on the known gene database, then there may be a lot reads that come from unknown gene or alternative splicing cannot be aligned. Thus, we show a combinational method that can cover potential alternative splicing junction sites. Combining with the original gene database, the new align tools can cover most alignments which are reported by other RNA-Seq align tools. Recently a lot of RNA-Seq align tools have been developed. They are more powerful and faster than the old generation tools. However, the RNA read alignment is much more complicated than other sequence alignment. The alignments reported by some RNA-Seq align tools have low accuracy. We present a simple and efficient filter method based on the quality score of the reads. It can filter most low accuracy alignments. At last, we present a RNA secondary prediction method that can predict pseudoknot(a type of RNA secondary structure) with high sensitivity and specificity. === published_or_final_version === Computer Science === Master === Master of Philosophy
author2 Ting, HF
author_facet Ting, HF
Li, Yaoman
李耀满
author Li, Yaoman
李耀满
author_sort Li, Yaoman
title Efficient methods for improving the sensitivity and accuracy of RNA alignments and structure prediction
title_short Efficient methods for improving the sensitivity and accuracy of RNA alignments and structure prediction
title_full Efficient methods for improving the sensitivity and accuracy of RNA alignments and structure prediction
title_fullStr Efficient methods for improving the sensitivity and accuracy of RNA alignments and structure prediction
title_full_unstemmed Efficient methods for improving the sensitivity and accuracy of RNA alignments and structure prediction
title_sort efficient methods for improving the sensitivity and accuracy of rna alignments and structure prediction
publisher The University of Hong Kong (Pokfulam, Hong Kong)
publishDate 2014
url http://hdl.handle.net/10722/195977
work_keys_str_mv AT liyaoman efficientmethodsforimprovingthesensitivityandaccuracyofrnaalignmentsandstructureprediction
AT lǐyàomǎn efficientmethodsforimprovingthesensitivityandaccuracyofrnaalignmentsandstructureprediction
_version_ 1716814134040854528