Utilities for Off-Target DNA Mining in Non-Model Organisms and Querying for Phylogenetic Patterns
High throughput sequencing data are rich in information and contain many off-target sequences (reads) that are often ignored but may be biologically relevant. Seed extension, a combination of reference and de novo based assembly methods, can be used to extract the information but it is time-consumin...
Other Authors: | |
---|---|
Format: | Others |
Language: | English English |
Published: |
Florida State University
|
Subjects: | |
Online Access: | http://purl.flvc.org/fsu/fd/2018_Sp_Mechtley_fsu_0071E_14520_comp |
id |
ndltd-fsu.edu-oai-fsu.digital.flvc.org-fsu_657910 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-fsu.edu-oai-fsu.digital.flvc.org-fsu_6579102019-07-01T05:20:51Z Utilities for Off-Target DNA Mining in Non-Model Organisms and Querying for Phylogenetic Patterns Mechtley, Alisha (author) Lemmon, Alan R (professor directing dissertation) Arbeitman, Michelle N. (university representative) Meyer-Bäse, Anke (committee member) Beerli, Peter (committee member) Slice, Dennis E. (committee member) Florida State University (degree granting institution) College of Arts and Sciences (degree granting college) Department of Scientific Computing (degree granting departmentdgg) Text text doctoral thesis Florida State University English eng 1 online resource (73 pages) computer application/pdf High throughput sequencing data are rich in information and contain many off-target sequences (reads) that are often ignored but may be biologically relevant. Seed extension, a combination of reference and de novo based assembly methods, can be used to extract the information but it is time-consuming to implement because it requires that multiple seeds (sequences from one or many closely related species) be gathered in advance. A new tool is presented here, SeedSQrrL, that can automatically crawl the web to gather the seeds from the closest taxonomic relative for each gene and store it into a relational database. The seeds can then be used to create multiple seed extensions which are later combined into a reference or used for downstream phylogenetic analysis. Patterns in the resulting gene trees can be searched for using the traditional methods of tree comparison (Robinson-Foulds topological distance and branch-length comparison methods). Currently, no open source tree pattern matching program exists that allows the user to modify algorithms and create their own custom pattern matching functions. I have worked on such a tool, called Treematcher, and it will be made available in the ETE Toolkit (a Python Environment for Tree Exploration). Three biological case studies will be included included to demonstrate the capabilities of the two programs: 1) a custom function in Treematcher to perform a regular expression-like query, 2) SeedSQrrL will be used to isolate mitochondrial genes from snakes and chloroplast genes from angiosperms, and 3) a large case study of animals will be assembled. A Dissertation submitted to the Department of Scientific Computing in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Spring Semester 2018. April 2, 2018. Automated Gene Reference Collection, Gene Tree Pattern Matching, High Throughput Sequence Analysis, NCBI Taxonomy, Open Source Software for Bioinformatics, Python Includes bibliographical references. Alan Lemmon, Professor Directing Dissertation; Michelle Arbeitman, University Representative; Anke Meyer-Baese, Committee Member; Peter Beerli, Committee Member; Dennis Slice, Committee Member. Bioinformatics Biology--Classification Information science 2018_Sp_Mechtley_fsu_0071E_14520_comp http://purl.flvc.org/fsu/fd/2018_Sp_Mechtley_fsu_0071E_14520_comp http://diginole.lib.fsu.edu/islandora/object/fsu%3A657910/datastream/TN/view/Utilities%20for%20Off-Target%20DNA%20Mining%20in%20Non-Model%20Organisms%20and%20Querying%20for%20Phylogenetic%20Patterns.jpg |
collection |
NDLTD |
language |
English English |
format |
Others
|
sources |
NDLTD |
topic |
Bioinformatics Biology--Classification Information science |
spellingShingle |
Bioinformatics Biology--Classification Information science Utilities for Off-Target DNA Mining in Non-Model Organisms and Querying for Phylogenetic Patterns |
description |
High throughput sequencing data are rich in information and contain many off-target sequences (reads) that are often ignored but may be biologically relevant. Seed extension, a combination of reference and de novo based assembly methods, can be used to extract the information but it is time-consuming to implement because it requires that multiple seeds (sequences from one or many closely related species) be gathered in advance. A new tool is presented here, SeedSQrrL, that can automatically crawl the web to gather the seeds from the closest taxonomic relative for each gene and store it into a relational database. The seeds can then be used to create multiple seed extensions which are later combined into a reference or used for downstream phylogenetic analysis. Patterns in the resulting gene trees can be searched for using the traditional methods of tree comparison (Robinson-Foulds topological distance and branch-length comparison methods). Currently, no open source tree pattern matching program exists that allows the user to modify algorithms and create their own custom pattern matching functions. I have worked on such a tool, called Treematcher, and it will be made available in the ETE Toolkit (a Python Environment for Tree Exploration). Three biological case studies will be included included to demonstrate the capabilities of the two programs: 1) a custom function in Treematcher to perform a regular expression-like query, 2) SeedSQrrL will be used to isolate mitochondrial genes from snakes and chloroplast genes from angiosperms, and 3) a large case study of animals will be assembled. === A Dissertation submitted to the Department of Scientific Computing in partial fulfillment of the requirements for the degree of Doctor of Philosophy. === Spring Semester 2018. === April 2, 2018. === Automated Gene Reference Collection, Gene Tree Pattern Matching, High Throughput Sequence Analysis, NCBI Taxonomy, Open Source Software for Bioinformatics, Python === Includes bibliographical references. === Alan Lemmon, Professor Directing Dissertation; Michelle Arbeitman, University Representative; Anke Meyer-Baese, Committee Member; Peter Beerli, Committee Member; Dennis Slice, Committee Member. |
author2 |
Mechtley, Alisha (author) |
author_facet |
Mechtley, Alisha (author) |
title |
Utilities for Off-Target DNA Mining in Non-Model Organisms and Querying for Phylogenetic Patterns |
title_short |
Utilities for Off-Target DNA Mining in Non-Model Organisms and Querying for Phylogenetic Patterns |
title_full |
Utilities for Off-Target DNA Mining in Non-Model Organisms and Querying for Phylogenetic Patterns |
title_fullStr |
Utilities for Off-Target DNA Mining in Non-Model Organisms and Querying for Phylogenetic Patterns |
title_full_unstemmed |
Utilities for Off-Target DNA Mining in Non-Model Organisms and Querying for Phylogenetic Patterns |
title_sort |
utilities for off-target dna mining in non-model organisms and querying for phylogenetic patterns |
publisher |
Florida State University |
url |
http://purl.flvc.org/fsu/fd/2018_Sp_Mechtley_fsu_0071E_14520_comp |
_version_ |
1719218117269258240 |