A Polyglot Approach to Bioinformatics Data Integration: A Phylogenetic Analysis of HIV-1

As sequencing technologies continue to drop in price and increase in throughput, new challenges emerge for the management and accessibility of genomic sequence data. We have developed a pipeline for facilitating the storage, retrieval, and subsequent analysis of molecular data, integrating both sequ...

Full description

Bibliographic Details
Main Authors: Steven Reisman, Thomas Hatzopoulos, Konstantin Läufer, George K. Thiruvathukal, Catherine Putonti
Format: Article
Language:English
Published: SAGE Publishing 2016-01-01
Series:Evolutionary Bioinformatics
Online Access:https://doi.org/10.4137/EBO.S32757
id doaj-437d20bdc62d42359fab43702c35c3c9
record_format Article
spelling doaj-437d20bdc62d42359fab43702c35c3c92020-11-25T04:08:57ZengSAGE PublishingEvolutionary Bioinformatics1176-93432016-01-011210.4137/EBO.S32757A Polyglot Approach to Bioinformatics Data Integration: A Phylogenetic Analysis of HIV-1Steven Reisman0Thomas Hatzopoulos1Konstantin Läufer2George K. Thiruvathukal3Catherine Putonti4Department of Biology, Loyola University Chicago, Chicago, IL, USA.Department of Computer Science, Loyola University Chicago, Chicago, IL, USA.Department of Computer Science, Loyola University Chicago, Chicago, IL, USA.Department of Computer Science, Loyola University Chicago, Chicago, IL, USA.Department of Biology, Loyola University Chicago, Chicago, IL, USA.As sequencing technologies continue to drop in price and increase in throughput, new challenges emerge for the management and accessibility of genomic sequence data. We have developed a pipeline for facilitating the storage, retrieval, and subsequent analysis of molecular data, integrating both sequence and metadata. Taking a polyglot approach involving multiple languages, libraries, and persistence mechanisms, sequence data can be aggregated from publicly available and local repositories. Data are exposed in the form of a RESTful web service, formatted for easy querying, and retrieved for downstream analyses. As a proof of concept, we have developed a resource for annotated HIV-1 sequences. Phylogenetic analyses were conducted for >6,000 HIV-1 sequences revealing spatial and temporal factors influence the evolution of the individual genes uniquely. Nevertheless, signatures of origin can be extrapolated even despite increased globalization. The approach developed here can easily be customized for any species of interest.https://doi.org/10.4137/EBO.S32757
collection DOAJ
language English
format Article
sources DOAJ
author Steven Reisman
Thomas Hatzopoulos
Konstantin Läufer
George K. Thiruvathukal
Catherine Putonti
spellingShingle Steven Reisman
Thomas Hatzopoulos
Konstantin Läufer
George K. Thiruvathukal
Catherine Putonti
A Polyglot Approach to Bioinformatics Data Integration: A Phylogenetic Analysis of HIV-1
Evolutionary Bioinformatics
author_facet Steven Reisman
Thomas Hatzopoulos
Konstantin Läufer
George K. Thiruvathukal
Catherine Putonti
author_sort Steven Reisman
title A Polyglot Approach to Bioinformatics Data Integration: A Phylogenetic Analysis of HIV-1
title_short A Polyglot Approach to Bioinformatics Data Integration: A Phylogenetic Analysis of HIV-1
title_full A Polyglot Approach to Bioinformatics Data Integration: A Phylogenetic Analysis of HIV-1
title_fullStr A Polyglot Approach to Bioinformatics Data Integration: A Phylogenetic Analysis of HIV-1
title_full_unstemmed A Polyglot Approach to Bioinformatics Data Integration: A Phylogenetic Analysis of HIV-1
title_sort polyglot approach to bioinformatics data integration: a phylogenetic analysis of hiv-1
publisher SAGE Publishing
series Evolutionary Bioinformatics
issn 1176-9343
publishDate 2016-01-01
description As sequencing technologies continue to drop in price and increase in throughput, new challenges emerge for the management and accessibility of genomic sequence data. We have developed a pipeline for facilitating the storage, retrieval, and subsequent analysis of molecular data, integrating both sequence and metadata. Taking a polyglot approach involving multiple languages, libraries, and persistence mechanisms, sequence data can be aggregated from publicly available and local repositories. Data are exposed in the form of a RESTful web service, formatted for easy querying, and retrieved for downstream analyses. As a proof of concept, we have developed a resource for annotated HIV-1 sequences. Phylogenetic analyses were conducted for >6,000 HIV-1 sequences revealing spatial and temporal factors influence the evolution of the individual genes uniquely. Nevertheless, signatures of origin can be extrapolated even despite increased globalization. The approach developed here can easily be customized for any species of interest.
url https://doi.org/10.4137/EBO.S32757
work_keys_str_mv AT stevenreisman apolyglotapproachtobioinformaticsdataintegrationaphylogeneticanalysisofhiv1
AT thomashatzopoulos apolyglotapproachtobioinformaticsdataintegrationaphylogeneticanalysisofhiv1
AT konstantinlaufer apolyglotapproachtobioinformaticsdataintegrationaphylogeneticanalysisofhiv1
AT georgekthiruvathukal apolyglotapproachtobioinformaticsdataintegrationaphylogeneticanalysisofhiv1
AT catherineputonti apolyglotapproachtobioinformaticsdataintegrationaphylogeneticanalysisofhiv1
AT stevenreisman polyglotapproachtobioinformaticsdataintegrationaphylogeneticanalysisofhiv1
AT thomashatzopoulos polyglotapproachtobioinformaticsdataintegrationaphylogeneticanalysisofhiv1
AT konstantinlaufer polyglotapproachtobioinformaticsdataintegrationaphylogeneticanalysisofhiv1
AT georgekthiruvathukal polyglotapproachtobioinformaticsdataintegrationaphylogeneticanalysisofhiv1
AT catherineputonti polyglotapproachtobioinformaticsdataintegrationaphylogeneticanalysisofhiv1
_version_ 1724423948957908992