Motif representation and discovery

An important part of gene regulation is mediated by specific proteins, called transcription factors, which influence the transcription of a particular gene by binding to specific sites on DNA sequences, called transcription factor binding sites (TFBS) or, simply, motifs. Such binding sites are relat...

Full description

Bibliographic Details
Main Author: Carvalho, A.M.
Language:ENG
Published: 2011
Subjects:
Online Access:http://tel.archives-ouvertes.fr/tel-00755042
http://tel.archives-ouvertes.fr/docs/00/75/50/42/PDF/phdthesis-AMCarvalho.pdf
id ndltd-CCSD-oai-tel.archives-ouvertes.fr-tel-00755042
record_format oai_dc
spelling ndltd-CCSD-oai-tel.archives-ouvertes.fr-tel-007550422013-01-07T16:29:02Z http://tel.archives-ouvertes.fr/tel-00755042 http://tel.archives-ouvertes.fr/docs/00/75/50/42/PDF/phdthesis-AMCarvalho.pdf Motif representation and discovery Carvalho, A.M. [INFO:INFO_BI] Computer Science/Bioinformatics [SDV:BIBS] Life Sciences/Quantitative Methods Motif representation Discriminative learning Bayesian network Motif discovery Combinatorial algorithm Position specific prior An important part of gene regulation is mediated by specific proteins, called transcription factors, which influence the transcription of a particular gene by binding to specific sites on DNA sequences, called transcription factor binding sites (TFBS) or, simply, motifs. Such binding sites are relatively short segments of DNA, normally 5 to 25 nucleotides long, over- represented in a set of co-regulated DNA sequences. There are two different problems in this setup: motif representation, accounting for the model that describes the TFBS's; and motif discovery, focusing in unravelling TFBS's from a set of co-regulated DNA sequences. This thesis proposes a discriminative scoring criterion that culminates in a discriminative mixture of Bayesian networks to distinguish TFBS's from the background DNA. This new probabilistic model supports further evidence in non-additivity among binding site positions, providing a superior discriminative power in TFBS's detection. On the other hand, extra knowledge carefully selected from the literature was incorporated in TFBS discovery in order to capture a variety of characteristics of the TFBS's patterns. This extra knowledge was combined during the process of motif discovery leading to results that are considerably more accurate than those achieved by methods that rely in the DNA sequence alone. 2011-07-01 ENG PhD thesis
collection NDLTD
language ENG
sources NDLTD
topic [INFO:INFO_BI] Computer Science/Bioinformatics
[SDV:BIBS] Life Sciences/Quantitative Methods
Motif representation
Discriminative learning
Bayesian network
Motif discovery
Combinatorial algorithm
Position specific prior
spellingShingle [INFO:INFO_BI] Computer Science/Bioinformatics
[SDV:BIBS] Life Sciences/Quantitative Methods
Motif representation
Discriminative learning
Bayesian network
Motif discovery
Combinatorial algorithm
Position specific prior
Carvalho, A.M.
Motif representation and discovery
description An important part of gene regulation is mediated by specific proteins, called transcription factors, which influence the transcription of a particular gene by binding to specific sites on DNA sequences, called transcription factor binding sites (TFBS) or, simply, motifs. Such binding sites are relatively short segments of DNA, normally 5 to 25 nucleotides long, over- represented in a set of co-regulated DNA sequences. There are two different problems in this setup: motif representation, accounting for the model that describes the TFBS's; and motif discovery, focusing in unravelling TFBS's from a set of co-regulated DNA sequences. This thesis proposes a discriminative scoring criterion that culminates in a discriminative mixture of Bayesian networks to distinguish TFBS's from the background DNA. This new probabilistic model supports further evidence in non-additivity among binding site positions, providing a superior discriminative power in TFBS's detection. On the other hand, extra knowledge carefully selected from the literature was incorporated in TFBS discovery in order to capture a variety of characteristics of the TFBS's patterns. This extra knowledge was combined during the process of motif discovery leading to results that are considerably more accurate than those achieved by methods that rely in the DNA sequence alone.
author Carvalho, A.M.
author_facet Carvalho, A.M.
author_sort Carvalho, A.M.
title Motif representation and discovery
title_short Motif representation and discovery
title_full Motif representation and discovery
title_fullStr Motif representation and discovery
title_full_unstemmed Motif representation and discovery
title_sort motif representation and discovery
publishDate 2011
url http://tel.archives-ouvertes.fr/tel-00755042
http://tel.archives-ouvertes.fr/docs/00/75/50/42/PDF/phdthesis-AMCarvalho.pdf
work_keys_str_mv AT carvalhoam motifrepresentationanddiscovery
_version_ 1716394728633663488