Systematic identification of conserved motif modules in the human genome

<p>Abstract</p> <p>Background</p> <p>The identification of motif modules, groups of multiple motifs frequently occurring in DNA sequences, is one of the most important tasks necessary for annotating the human genome. Current approaches to identifying motif modules are o...

Full description

Bibliographic Details
Main Authors: Hu Haiyan, Su Naifang, Hou Lin, Cai Xiaohui, Deng Minghua, Li Xiaoman
Format: Article
Language:English
Published: BMC 2010-10-01
Series:BMC Genomics
Online Access:http://www.biomedcentral.com/1471-2164/11/567
id doaj-befeb97109324486945224749b3a784e
record_format Article
spelling doaj-befeb97109324486945224749b3a784e2020-11-24T21:36:20ZengBMCBMC Genomics1471-21642010-10-0111156710.1186/1471-2164-11-567Systematic identification of conserved motif modules in the human genomeHu HaiyanSu NaifangHou LinCai XiaohuiDeng MinghuaLi Xiaoman<p>Abstract</p> <p>Background</p> <p>The identification of motif modules, groups of multiple motifs frequently occurring in DNA sequences, is one of the most important tasks necessary for annotating the human genome. Current approaches to identifying motif modules are often restricted to searches within promoter regions or rely on multiple genome alignments. However, the promoter regions only account for a limited number of locations where transcription factor binding sites can occur, and multiple genome alignments often cannot align binding sites with their true counterparts because of the short and degenerative nature of these transcription factor binding sites.</p> <p>Results</p> <p>To identify motif modules systematically, we developed a computational method for the entire non-coding regions around human genes that does not rely upon the use of multiple genome alignments. First, we selected orthologous DNA blocks approximately 1-kilobase in length based on discontiguous sequence similarity. Next, we scanned the conserved segments in these blocks using known motifs in the TRANSFAC database. Finally, a frequent pattern mining technique was applied to identify motif modules within these blocks. In total, with a false discovery rate cutoff of 0.05, we predicted 3,161,839 motif modules, 90.8% of which are supported by various forms of functional evidence. Compared with experimental data from 14 ChIP-seq experiments, on average, our methods predicted 69.6% of the ChIP-seq peaks with TFBSs of multiple TFs. Our findings also show that many motif modules have distance preference and order preference among the motifs, which further supports the functionality of these predictions.</p> <p>Conclusions</p> <p>Our work provides a large-scale prediction of motif modules in mammals, which will facilitate the understanding of gene regulation in a systematic way.</p> http://www.biomedcentral.com/1471-2164/11/567
collection DOAJ
language English
format Article
sources DOAJ
author Hu Haiyan
Su Naifang
Hou Lin
Cai Xiaohui
Deng Minghua
Li Xiaoman
spellingShingle Hu Haiyan
Su Naifang
Hou Lin
Cai Xiaohui
Deng Minghua
Li Xiaoman
Systematic identification of conserved motif modules in the human genome
BMC Genomics
author_facet Hu Haiyan
Su Naifang
Hou Lin
Cai Xiaohui
Deng Minghua
Li Xiaoman
author_sort Hu Haiyan
title Systematic identification of conserved motif modules in the human genome
title_short Systematic identification of conserved motif modules in the human genome
title_full Systematic identification of conserved motif modules in the human genome
title_fullStr Systematic identification of conserved motif modules in the human genome
title_full_unstemmed Systematic identification of conserved motif modules in the human genome
title_sort systematic identification of conserved motif modules in the human genome
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2010-10-01
description <p>Abstract</p> <p>Background</p> <p>The identification of motif modules, groups of multiple motifs frequently occurring in DNA sequences, is one of the most important tasks necessary for annotating the human genome. Current approaches to identifying motif modules are often restricted to searches within promoter regions or rely on multiple genome alignments. However, the promoter regions only account for a limited number of locations where transcription factor binding sites can occur, and multiple genome alignments often cannot align binding sites with their true counterparts because of the short and degenerative nature of these transcription factor binding sites.</p> <p>Results</p> <p>To identify motif modules systematically, we developed a computational method for the entire non-coding regions around human genes that does not rely upon the use of multiple genome alignments. First, we selected orthologous DNA blocks approximately 1-kilobase in length based on discontiguous sequence similarity. Next, we scanned the conserved segments in these blocks using known motifs in the TRANSFAC database. Finally, a frequent pattern mining technique was applied to identify motif modules within these blocks. In total, with a false discovery rate cutoff of 0.05, we predicted 3,161,839 motif modules, 90.8% of which are supported by various forms of functional evidence. Compared with experimental data from 14 ChIP-seq experiments, on average, our methods predicted 69.6% of the ChIP-seq peaks with TFBSs of multiple TFs. Our findings also show that many motif modules have distance preference and order preference among the motifs, which further supports the functionality of these predictions.</p> <p>Conclusions</p> <p>Our work provides a large-scale prediction of motif modules in mammals, which will facilitate the understanding of gene regulation in a systematic way.</p>
url http://www.biomedcentral.com/1471-2164/11/567
work_keys_str_mv AT huhaiyan systematicidentificationofconservedmotifmodulesinthehumangenome
AT sunaifang systematicidentificationofconservedmotifmodulesinthehumangenome
AT houlin systematicidentificationofconservedmotifmodulesinthehumangenome
AT caixiaohui systematicidentificationofconservedmotifmodulesinthehumangenome
AT dengminghua systematicidentificationofconservedmotifmodulesinthehumangenome
AT lixiaoman systematicidentificationofconservedmotifmodulesinthehumangenome
_version_ 1725941677709852672