Techniques for Storing and Processing Next-Generation DNA Sequencing Data
Main Author: | |
---|---|
Language: | English |
Published: |
The Ohio State University / OhioLINK
2014
|
Subjects: | |
Online Access: | http://rave.ohiolink.edu/etdc/view?acc_num=osu1388502159 |
id |
ndltd-OhioLink-oai-etd.ohiolink.edu-osu1388502159 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-OhioLink-oai-etd.ohiolink.edu-osu13885021592021-08-03T06:21:20Z Techniques for Storing and Processing Next-Generation DNA Sequencing Data Camerlengo, Terry Luke Bioinformatics DNA sequence storage 4 bit encoding reference-based compression Needleman-Wusnch DNA base pair compression sequence compression MongoDB NGS Data management bioinformatics NoSQL 3 bases per byte Genomics is undergoing unprecedented transformation due to rapid improvements in genetic sequencing technology, which has lowered costs for genetic sequencing experiments while increasing the amount of data generated in a typical experiment (McKinsey Global Institute, May 2013, pp. 86-94). The increase in data has shifted the burden from analysis and research to expertise in IT hardware and network support for distributed and efficient processing. Bioinformaticians, in response to a data-rich environment, are challenged to develop better and faster algorithms to solve problems in genomics and molecular biology research.This thesis examines the storage and data processing issues inherent in next- generation DNA sequencing (NGS). This work details the design and implementation of a software prototype that exemplifies the current approaches as it relates to the efficient storage of NGS data. The software library is utilized within the context of a previous software project which accompanies the publication related to the HT_SOSA assay. The software for the HT_SOSA, called NGSPositionCounter, demonstrates a workflow that is common in a molecular biology research lab. In an effort to scale beyond the research institute, the software library’s architecture takes into account scalability considerations for data storage and processing demands that are more likely to be encountered in a clinical or commercial enterprise. 2014-06-02 English text The Ohio State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=osu1388502159 http://rave.ohiolink.edu/etdc/view?acc_num=osu1388502159 unrestricted This thesis or dissertation is protected by copyright: some rights reserved. It is licensed for use under a Creative Commons license. Specific terms and permissions are available from this document's record in the OhioLINK ETD Center. |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
topic |
Bioinformatics DNA sequence storage 4 bit encoding reference-based compression Needleman-Wusnch DNA base pair compression sequence compression MongoDB NGS Data management bioinformatics NoSQL 3 bases per byte |
spellingShingle |
Bioinformatics DNA sequence storage 4 bit encoding reference-based compression Needleman-Wusnch DNA base pair compression sequence compression MongoDB NGS Data management bioinformatics NoSQL 3 bases per byte Camerlengo, Terry Luke Techniques for Storing and Processing Next-Generation DNA Sequencing Data |
author |
Camerlengo, Terry Luke |
author_facet |
Camerlengo, Terry Luke |
author_sort |
Camerlengo, Terry Luke |
title |
Techniques for Storing and Processing Next-Generation DNA Sequencing Data |
title_short |
Techniques for Storing and Processing Next-Generation DNA Sequencing Data |
title_full |
Techniques for Storing and Processing Next-Generation DNA Sequencing Data |
title_fullStr |
Techniques for Storing and Processing Next-Generation DNA Sequencing Data |
title_full_unstemmed |
Techniques for Storing and Processing Next-Generation DNA Sequencing Data |
title_sort |
techniques for storing and processing next-generation dna sequencing data |
publisher |
The Ohio State University / OhioLINK |
publishDate |
2014 |
url |
http://rave.ohiolink.edu/etdc/view?acc_num=osu1388502159 |
work_keys_str_mv |
AT camerlengoterryluke techniquesforstoringandprocessingnextgenerationdnasequencingdata |
_version_ |
1719435280812867584 |