A novel approach for transcription factor analysis using SELEX with high-throughput sequencing (TFAST).

In previous work, we designed a modified aptamer-free SELEX-seq protocol (afSELEX-seq) for the discovery of transcription factor binding sites. Here, we present original software, TFAST, designed to analyze afSELEX-seq data, validated against our previously generated afSELEX-seq dataset and a model...

Full description

Bibliographic Details
Main Authors: Daniel J Reiss, Frederick M Howard, Harry L T Mobley
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2012-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3430675?pdf=render
id doaj-5767dbe86f8e479d99daf961752b2a3b
record_format Article
spelling doaj-5767dbe86f8e479d99daf961752b2a3b2020-11-25T01:52:50ZengPublic Library of Science (PLoS)PLoS ONE1932-62032012-01-0178e4276110.1371/journal.pone.0042761A novel approach for transcription factor analysis using SELEX with high-throughput sequencing (TFAST).Daniel J ReissFrederick M HowardHarry L T MobleyIn previous work, we designed a modified aptamer-free SELEX-seq protocol (afSELEX-seq) for the discovery of transcription factor binding sites. Here, we present original software, TFAST, designed to analyze afSELEX-seq data, validated against our previously generated afSELEX-seq dataset and a model dataset. TFAST is designed with a simple graphical interface (Java) so that it can be installed and executed without extensive expertise in bioinformatics. TFAST completes analysis within minutes on most personal computers.Once afSELEX-seq data are aligned to a target genome, TFAST identifies peaks and, uniquely, compares peak characteristics between cycles. TFAST generates a hierarchical report of graded peaks, their associated genomic sequences, binding site length predictions, and dummy sequences.Including additional cycles of afSELEX-seq improved TFAST's ability to selectively identify peaks, leading to 7,274, 4,255, and 2,628 peaks identified in two-, three-, and four-cycle afSELEX-seq. Inter-round analysis by TFAST identified 457 peaks as the strongest candidates for true binding sites. Separating peaks by TFAST into classes of worst, second-best and best candidate peaks revealed a trend of increasing significance (e-values 4.5 × 10(12), 2.9 × 10(-46), and 1.2 × 10(-73)) and informational content (11.0, 11.9, and 12.5 bits over 15 bp) of discovered motifs within each respective class. TFAST also predicted a binding site length (28 bp) consistent with non-computational experimentally derived results for the transcription factor PapX (22 to 29 bp).TFAST offers a novel and intuitive approach for determining DNA binding sites of proteins subjected to afSELEX-seq. Here, we demonstrate that TFAST, using afSELEX-seq data, rapidly and accurately predicted sequence length and motif for a putative transcription factor's binding site.http://europepmc.org/articles/PMC3430675?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Daniel J Reiss
Frederick M Howard
Harry L T Mobley
spellingShingle Daniel J Reiss
Frederick M Howard
Harry L T Mobley
A novel approach for transcription factor analysis using SELEX with high-throughput sequencing (TFAST).
PLoS ONE
author_facet Daniel J Reiss
Frederick M Howard
Harry L T Mobley
author_sort Daniel J Reiss
title A novel approach for transcription factor analysis using SELEX with high-throughput sequencing (TFAST).
title_short A novel approach for transcription factor analysis using SELEX with high-throughput sequencing (TFAST).
title_full A novel approach for transcription factor analysis using SELEX with high-throughput sequencing (TFAST).
title_fullStr A novel approach for transcription factor analysis using SELEX with high-throughput sequencing (TFAST).
title_full_unstemmed A novel approach for transcription factor analysis using SELEX with high-throughput sequencing (TFAST).
title_sort novel approach for transcription factor analysis using selex with high-throughput sequencing (tfast).
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2012-01-01
description In previous work, we designed a modified aptamer-free SELEX-seq protocol (afSELEX-seq) for the discovery of transcription factor binding sites. Here, we present original software, TFAST, designed to analyze afSELEX-seq data, validated against our previously generated afSELEX-seq dataset and a model dataset. TFAST is designed with a simple graphical interface (Java) so that it can be installed and executed without extensive expertise in bioinformatics. TFAST completes analysis within minutes on most personal computers.Once afSELEX-seq data are aligned to a target genome, TFAST identifies peaks and, uniquely, compares peak characteristics between cycles. TFAST generates a hierarchical report of graded peaks, their associated genomic sequences, binding site length predictions, and dummy sequences.Including additional cycles of afSELEX-seq improved TFAST's ability to selectively identify peaks, leading to 7,274, 4,255, and 2,628 peaks identified in two-, three-, and four-cycle afSELEX-seq. Inter-round analysis by TFAST identified 457 peaks as the strongest candidates for true binding sites. Separating peaks by TFAST into classes of worst, second-best and best candidate peaks revealed a trend of increasing significance (e-values 4.5 × 10(12), 2.9 × 10(-46), and 1.2 × 10(-73)) and informational content (11.0, 11.9, and 12.5 bits over 15 bp) of discovered motifs within each respective class. TFAST also predicted a binding site length (28 bp) consistent with non-computational experimentally derived results for the transcription factor PapX (22 to 29 bp).TFAST offers a novel and intuitive approach for determining DNA binding sites of proteins subjected to afSELEX-seq. Here, we demonstrate that TFAST, using afSELEX-seq data, rapidly and accurately predicted sequence length and motif for a putative transcription factor's binding site.
url http://europepmc.org/articles/PMC3430675?pdf=render
work_keys_str_mv AT danieljreiss anovelapproachfortranscriptionfactoranalysisusingselexwithhighthroughputsequencingtfast
AT frederickmhoward anovelapproachfortranscriptionfactoranalysisusingselexwithhighthroughputsequencingtfast
AT harryltmobley anovelapproachfortranscriptionfactoranalysisusingselexwithhighthroughputsequencingtfast
AT danieljreiss novelapproachfortranscriptionfactoranalysisusingselexwithhighthroughputsequencingtfast
AT frederickmhoward novelapproachfortranscriptionfactoranalysisusingselexwithhighthroughputsequencingtfast
AT harryltmobley novelapproachfortranscriptionfactoranalysisusingselexwithhighthroughputsequencingtfast
_version_ 1724992667362787328