Summary: | Identification and characterization of regions influencing the precise spatial and temporal expression of genes is critical to our understanding of gene regulatory networks. Connecting transcription factors to the cis-regulatory elements that they bind and regulate remains a challenging problem in computational biology. The rapid accumulation of whole genome sequences and genome-wide expression data, and advances in alignment algorithms and motif-finding methods, provide opportunities to tackle the important task of dissecting how genes are regulated.
Genes exhibiting similar expression profiles are often regulated by common transcription factors. We developed a method for identifying statistically over-represented regulatory motifs in the promoters of co-expressed genes using weight matrix models representing the specificity of known factors. Application of our methods to yeast fermenting in grape must revealed elements that play important roles in utilizing carbon sources. Extension of the method to metazoan genomes via incorporation of comparative sequence analysis facilitated identification of functionally relevant binding sites for sets of tissue-specific genes, and for genes showing similar expression in large-scale expression profiling studies. Further extensions address alternative promoters for human genes and coordinated binding of multiple transcription factors to cis-regulatory modules.
Sequence conservation reveals segments of genes of potential interest, but the degree of sequence divergence among human genes and their orthologous sequences varies widely. Genes with a small number of well-distinguished, highly conserved non-coding elements proximal to the transcription start site may be well-suited for targeted laboratory promoter characterization studies. We developed a “regulatory resolution” score to prioritize lists of genes for laboratory gene regulation studies based on the conservation profile of their promoters. Additionally, genome-wide comparisons of vertebrate genomes have revealed surprisingly large numbers of highly conserved non-coding elements (HCNEs) that cluster nearby to genes associated with transcription and development. To further our understanding of the genomic organization of regulatory regions, we developed methods to identify HCNEs in insects. We find that HCNEs in insects have similar function and organization as their vertebrate counterparts. Our data suggests that microsynteny in insects has been retained to keep large arrays of HCNEs intact, forming genomic regulatory blocks that surround the key developmental genes they regulate.
|