Predicting target sequences of DNA-binding proteins based on primary structure

碩士 === 國立臺灣大學 === 資訊工程學研究所 === 99 === Proteins that bind specific DNA sequences play important roles in regulating gene expression. Identifying target sequences of a DNA-binding protein helps to understand how genes are regulated in cells and explain how genetic variations cause disruption of normal...

Full description

Bibliographic Details
Main Authors: Chih-Wei Lin, 林志瑋
Other Authors: 歐陽彥正
Format: Others
Language:en_US
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/79248267805763094339
id ndltd-TW-099NTU05392097
record_format oai_dc
spelling ndltd-TW-099NTU053920972015-10-16T04:03:10Z http://ndltd.ncl.edu.tw/handle/79248267805763094339 Predicting target sequences of DNA-binding proteins based on primary structure 從一級結構預測DNA結合蛋白之標的序列 Chih-Wei Lin 林志瑋 碩士 國立臺灣大學 資訊工程學研究所 99 Proteins that bind specific DNA sequences play important roles in regulating gene expression. Identifying target sequences of a DNA-binding protein helps to understand how genes are regulated in cells and explain how genetic variations cause disruption of normal gene expression. Position frequency matrices (PFMs) are one of the most widely used models to represent such target sequences. However, up to now, for most species, only a small fraction of the transcription factors (TFs) have experimentally determined PFMs. Since biological experiments usually require much time and cost, it is strongly desired to develop computational methods with satisfied accuracies to speedup the progress. Here, a new method based on existing protein-DNA complex structures and the knowledgebase containing the preference of contacts between amino acids and nucleotides is proposed to predict quantitative specificities of protein-DNA interactions. When given a query protein sequence, a protein-DNA complex structure of homologues proteins is selected and the PFM prediction is made based on the selected template incorporated with the built knowledgebase. The proposed method is evaluated by two datasets and compared with existing computational methods. It turns out that the proposed method can predict as well as the compared structure-based methods. On the other hand, when a sequence-based method that is trained by collected experimentally determined PFMs is compared, the proposed method performs slightly worse. Even though, the proposed method still has its value since different predictors usually have their own advantages and limitations. In summary, it is concluded that a DNA-binding protein’s binding preference can be predicted based on its primary structure using the complexes of its homologues. This facilitates related studies in the future because target sequences of proteins without a solved structure could be predicted now. 歐陽彥正 2011 學位論文 ; thesis 41 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 資訊工程學研究所 === 99 === Proteins that bind specific DNA sequences play important roles in regulating gene expression. Identifying target sequences of a DNA-binding protein helps to understand how genes are regulated in cells and explain how genetic variations cause disruption of normal gene expression. Position frequency matrices (PFMs) are one of the most widely used models to represent such target sequences. However, up to now, for most species, only a small fraction of the transcription factors (TFs) have experimentally determined PFMs. Since biological experiments usually require much time and cost, it is strongly desired to develop computational methods with satisfied accuracies to speedup the progress. Here, a new method based on existing protein-DNA complex structures and the knowledgebase containing the preference of contacts between amino acids and nucleotides is proposed to predict quantitative specificities of protein-DNA interactions. When given a query protein sequence, a protein-DNA complex structure of homologues proteins is selected and the PFM prediction is made based on the selected template incorporated with the built knowledgebase. The proposed method is evaluated by two datasets and compared with existing computational methods. It turns out that the proposed method can predict as well as the compared structure-based methods. On the other hand, when a sequence-based method that is trained by collected experimentally determined PFMs is compared, the proposed method performs slightly worse. Even though, the proposed method still has its value since different predictors usually have their own advantages and limitations. In summary, it is concluded that a DNA-binding protein’s binding preference can be predicted based on its primary structure using the complexes of its homologues. This facilitates related studies in the future because target sequences of proteins without a solved structure could be predicted now.
author2 歐陽彥正
author_facet 歐陽彥正
Chih-Wei Lin
林志瑋
author Chih-Wei Lin
林志瑋
spellingShingle Chih-Wei Lin
林志瑋
Predicting target sequences of DNA-binding proteins based on primary structure
author_sort Chih-Wei Lin
title Predicting target sequences of DNA-binding proteins based on primary structure
title_short Predicting target sequences of DNA-binding proteins based on primary structure
title_full Predicting target sequences of DNA-binding proteins based on primary structure
title_fullStr Predicting target sequences of DNA-binding proteins based on primary structure
title_full_unstemmed Predicting target sequences of DNA-binding proteins based on primary structure
title_sort predicting target sequences of dna-binding proteins based on primary structure
publishDate 2011
url http://ndltd.ncl.edu.tw/handle/79248267805763094339
work_keys_str_mv AT chihweilin predictingtargetsequencesofdnabindingproteinsbasedonprimarystructure
AT línzhìwěi predictingtargetsequencesofdnabindingproteinsbasedonprimarystructure
AT chihweilin cóngyījíjiégòuyùcèdnajiéhédànbáizhībiāodexùliè
AT línzhìwěi cóngyījíjiégòuyùcèdnajiéhédànbáizhībiāodexùliè
_version_ 1718091997694132224