TIP_finder: An HPC Software to Detect Transposable Element Insertion Polymorphisms in Large Genomic Datasets

Transposable elements (TEs) are non-static genomic units capable of moving indistinctly from one chromosomal location to another. Their insertion polymorphisms may cause beneficial mutations, such as the creation of new gene function, or deleterious in eukaryotes, e.g., different types of cancer in...

Full description

Bibliographic Details
Main Authors: Simon Orozco-Arias, Nicolas Tobon-Orozco, Johan S. Piña, Cristian Felipe Jiménez-Varón, Reinel Tabares-Soto, Romain Guyot
Format: Article
Language:English
Published: MDPI AG 2020-09-01
Series:Biology
Subjects:
HPC
Online Access:https://www.mdpi.com/2079-7737/9/9/281
id doaj-9ecdc7315f134346b388f1dd10310d31
record_format Article
spelling doaj-9ecdc7315f134346b388f1dd10310d312020-11-25T03:43:31ZengMDPI AGBiology2079-77372020-09-01928128110.3390/biology9090281TIP_finder: An HPC Software to Detect Transposable Element Insertion Polymorphisms in Large Genomic DatasetsSimon Orozco-Arias0Nicolas Tobon-Orozco1Johan S. Piña2Cristian Felipe Jiménez-Varón3Reinel Tabares-Soto4Romain Guyot5Department of Computer Science, Universidad Autónoma de Manizales, Manizales 170002, ColombiaDepartment of Computer Science, Universidad Autónoma de Manizales, Manizales 170002, ColombiaDepartment of Computer Science, Universidad Autónoma de Manizales, Manizales 170002, ColombiaDepartment of Physics and Mathematics, Universidad Autónoma de Manizales, Manizales 170002, ColombiaDepartment of Electronics and Automation, Universidad Autónoma de Manizales, Manizales 170002, ColombiaDepartment of Electronics and Automation, Universidad Autónoma de Manizales, Manizales 170002, ColombiaTransposable elements (TEs) are non-static genomic units capable of moving indistinctly from one chromosomal location to another. Their insertion polymorphisms may cause beneficial mutations, such as the creation of new gene function, or deleterious in eukaryotes, e.g., different types of cancer in humans. A particular type of TE called LTR-retrotransposons comprises almost 8% of the human genome. Among LTR retrotransposons, human endogenous retroviruses (HERVs) bear structural and functional similarities to retroviruses. Several tools allow the detection of transposon insertion polymorphisms (TIPs) but fail to efficiently analyze large genomes or large datasets. Here, we developed a computational tool, named TIP_finder, able to detect mobile element insertions in very large genomes, through high-performance computing (HPC) and parallel programming, using the inference of discordant read pair analysis. TIP_finder inputs are (i) short pair reads such as those obtained by Illumina, (ii) a chromosome-level reference genome sequence, and (iii) a database of consensus TE sequences. The HPC strategy we propose adds scalability and provides a useful tool to analyze huge genomic datasets in a decent running time. TIP_finder accelerates the detection of transposon insertion polymorphisms (TIPs) by up to 55 times in breast cancer datasets and 46 times in cancer-free datasets compared to the fastest available algorithms. TIP_finder applies a validated strategy to find TIPs, accelerates the process through HPC, and addresses the issues of runtime for large-scale analyses in the post-genomic era. TIP_finder version 1.0 is available at https://github.com/simonorozcoarias/TIP_finder.https://www.mdpi.com/2079-7737/9/9/281TIP_finderbioinformaticsHPCparallel programmingpolymorphismHERV
collection DOAJ
language English
format Article
sources DOAJ
author Simon Orozco-Arias
Nicolas Tobon-Orozco
Johan S. Piña
Cristian Felipe Jiménez-Varón
Reinel Tabares-Soto
Romain Guyot
spellingShingle Simon Orozco-Arias
Nicolas Tobon-Orozco
Johan S. Piña
Cristian Felipe Jiménez-Varón
Reinel Tabares-Soto
Romain Guyot
TIP_finder: An HPC Software to Detect Transposable Element Insertion Polymorphisms in Large Genomic Datasets
Biology
TIP_finder
bioinformatics
HPC
parallel programming
polymorphism
HERV
author_facet Simon Orozco-Arias
Nicolas Tobon-Orozco
Johan S. Piña
Cristian Felipe Jiménez-Varón
Reinel Tabares-Soto
Romain Guyot
author_sort Simon Orozco-Arias
title TIP_finder: An HPC Software to Detect Transposable Element Insertion Polymorphisms in Large Genomic Datasets
title_short TIP_finder: An HPC Software to Detect Transposable Element Insertion Polymorphisms in Large Genomic Datasets
title_full TIP_finder: An HPC Software to Detect Transposable Element Insertion Polymorphisms in Large Genomic Datasets
title_fullStr TIP_finder: An HPC Software to Detect Transposable Element Insertion Polymorphisms in Large Genomic Datasets
title_full_unstemmed TIP_finder: An HPC Software to Detect Transposable Element Insertion Polymorphisms in Large Genomic Datasets
title_sort tip_finder: an hpc software to detect transposable element insertion polymorphisms in large genomic datasets
publisher MDPI AG
series Biology
issn 2079-7737
publishDate 2020-09-01
description Transposable elements (TEs) are non-static genomic units capable of moving indistinctly from one chromosomal location to another. Their insertion polymorphisms may cause beneficial mutations, such as the creation of new gene function, or deleterious in eukaryotes, e.g., different types of cancer in humans. A particular type of TE called LTR-retrotransposons comprises almost 8% of the human genome. Among LTR retrotransposons, human endogenous retroviruses (HERVs) bear structural and functional similarities to retroviruses. Several tools allow the detection of transposon insertion polymorphisms (TIPs) but fail to efficiently analyze large genomes or large datasets. Here, we developed a computational tool, named TIP_finder, able to detect mobile element insertions in very large genomes, through high-performance computing (HPC) and parallel programming, using the inference of discordant read pair analysis. TIP_finder inputs are (i) short pair reads such as those obtained by Illumina, (ii) a chromosome-level reference genome sequence, and (iii) a database of consensus TE sequences. The HPC strategy we propose adds scalability and provides a useful tool to analyze huge genomic datasets in a decent running time. TIP_finder accelerates the detection of transposon insertion polymorphisms (TIPs) by up to 55 times in breast cancer datasets and 46 times in cancer-free datasets compared to the fastest available algorithms. TIP_finder applies a validated strategy to find TIPs, accelerates the process through HPC, and addresses the issues of runtime for large-scale analyses in the post-genomic era. TIP_finder version 1.0 is available at https://github.com/simonorozcoarias/TIP_finder.
topic TIP_finder
bioinformatics
HPC
parallel programming
polymorphism
HERV
url https://www.mdpi.com/2079-7737/9/9/281
work_keys_str_mv AT simonorozcoarias tipfinderanhpcsoftwaretodetecttransposableelementinsertionpolymorphismsinlargegenomicdatasets
AT nicolastobonorozco tipfinderanhpcsoftwaretodetecttransposableelementinsertionpolymorphismsinlargegenomicdatasets
AT johanspina tipfinderanhpcsoftwaretodetecttransposableelementinsertionpolymorphismsinlargegenomicdatasets
AT cristianfelipejimenezvaron tipfinderanhpcsoftwaretodetecttransposableelementinsertionpolymorphismsinlargegenomicdatasets
AT reineltabaressoto tipfinderanhpcsoftwaretodetecttransposableelementinsertionpolymorphismsinlargegenomicdatasets
AT romainguyot tipfinderanhpcsoftwaretodetecttransposableelementinsertionpolymorphismsinlargegenomicdatasets
_version_ 1724519330225324032