id ndltd-OhioLink-oai-etd.ohiolink.edu-osu1388502159
record_format oai_dc
spelling ndltd-OhioLink-oai-etd.ohiolink.edu-osu13885021592021-08-03T06:21:20Z Techniques for Storing and Processing Next-Generation DNA Sequencing Data Camerlengo, Terry Luke Bioinformatics DNA sequence storage 4 bit encoding reference-based compression Needleman-Wusnch DNA base pair compression sequence compression MongoDB NGS Data management bioinformatics NoSQL 3 bases per byte Genomics is undergoing unprecedented transformation due to rapid improvements in genetic sequencing technology, which has lowered costs for genetic sequencing experiments while increasing the amount of data generated in a typical experiment (McKinsey Global Institute, May 2013, pp. 86-94). The increase in data has shifted the burden from analysis and research to expertise in IT hardware and network support for distributed and efficient processing. Bioinformaticians, in response to a data-rich environment, are challenged to develop better and faster algorithms to solve problems in genomics and molecular biology research.This thesis examines the storage and data processing issues inherent in next- generation DNA sequencing (NGS). This work details the design and implementation of a software prototype that exemplifies the current approaches as it relates to the efficient storage of NGS data. The software library is utilized within the context of a previous software project which accompanies the publication related to the HT_SOSA assay. The software for the HT_SOSA, called NGSPositionCounter, demonstrates a workflow that is common in a molecular biology research lab. In an effort to scale beyond the research institute, the software library’s architecture takes into account scalability considerations for data storage and processing demands that are more likely to be encountered in a clinical or commercial enterprise. 2014-06-02 English text The Ohio State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=osu1388502159 http://rave.ohiolink.edu/etdc/view?acc_num=osu1388502159 unrestricted This thesis or dissertation is protected by copyright: some rights reserved. It is licensed for use under a Creative Commons license. Specific terms and permissions are available from this document's record in the OhioLINK ETD Center.
collection NDLTD
language English
sources NDLTD
topic Bioinformatics
DNA sequence storage
4 bit encoding
reference-based compression
Needleman-Wusnch
DNA base pair compression
sequence compression
MongoDB
NGS Data management
bioinformatics
NoSQL
3 bases per byte

spellingShingle Bioinformatics
DNA sequence storage
4 bit encoding
reference-based compression
Needleman-Wusnch
DNA base pair compression
sequence compression
MongoDB
NGS Data management
bioinformatics
NoSQL
3 bases per byte

Camerlengo, Terry Luke
Techniques for Storing and Processing Next-Generation DNA Sequencing Data
author Camerlengo, Terry Luke
author_facet Camerlengo, Terry Luke
author_sort Camerlengo, Terry Luke
title Techniques for Storing and Processing Next-Generation DNA Sequencing Data
title_short Techniques for Storing and Processing Next-Generation DNA Sequencing Data
title_full Techniques for Storing and Processing Next-Generation DNA Sequencing Data
title_fullStr Techniques for Storing and Processing Next-Generation DNA Sequencing Data
title_full_unstemmed Techniques for Storing and Processing Next-Generation DNA Sequencing Data
title_sort techniques for storing and processing next-generation dna sequencing data
publisher The Ohio State University / OhioLINK
publishDate 2014
url http://rave.ohiolink.edu/etdc/view?acc_num=osu1388502159
work_keys_str_mv AT camerlengoterryluke techniquesforstoringandprocessingnextgenerationdnasequencingdata
_version_ 1719435280812867584