Predicting target sequences of DNA-binding proteins based on primary structure
碩士 === 國立臺灣大學 === 資訊工程學研究所 === 99 === Proteins that bind specific DNA sequences play important roles in regulating gene expression. Identifying target sequences of a DNA-binding protein helps to understand how genes are regulated in cells and explain how genetic variations cause disruption of normal...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2011
|
Online Access: | http://ndltd.ncl.edu.tw/handle/79248267805763094339 |
id |
ndltd-TW-099NTU05392097 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-099NTU053920972015-10-16T04:03:10Z http://ndltd.ncl.edu.tw/handle/79248267805763094339 Predicting target sequences of DNA-binding proteins based on primary structure 從一級結構預測DNA結合蛋白之標的序列 Chih-Wei Lin 林志瑋 碩士 國立臺灣大學 資訊工程學研究所 99 Proteins that bind specific DNA sequences play important roles in regulating gene expression. Identifying target sequences of a DNA-binding protein helps to understand how genes are regulated in cells and explain how genetic variations cause disruption of normal gene expression. Position frequency matrices (PFMs) are one of the most widely used models to represent such target sequences. However, up to now, for most species, only a small fraction of the transcription factors (TFs) have experimentally determined PFMs. Since biological experiments usually require much time and cost, it is strongly desired to develop computational methods with satisfied accuracies to speedup the progress. Here, a new method based on existing protein-DNA complex structures and the knowledgebase containing the preference of contacts between amino acids and nucleotides is proposed to predict quantitative specificities of protein-DNA interactions. When given a query protein sequence, a protein-DNA complex structure of homologues proteins is selected and the PFM prediction is made based on the selected template incorporated with the built knowledgebase. The proposed method is evaluated by two datasets and compared with existing computational methods. It turns out that the proposed method can predict as well as the compared structure-based methods. On the other hand, when a sequence-based method that is trained by collected experimentally determined PFMs is compared, the proposed method performs slightly worse. Even though, the proposed method still has its value since different predictors usually have their own advantages and limitations. In summary, it is concluded that a DNA-binding protein’s binding preference can be predicted based on its primary structure using the complexes of its homologues. This facilitates related studies in the future because target sequences of proteins without a solved structure could be predicted now. 歐陽彥正 2011 學位論文 ; thesis 41 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣大學 === 資訊工程學研究所 === 99 === Proteins that bind specific DNA sequences play important roles in regulating gene expression. Identifying target sequences of a DNA-binding protein helps to understand how genes are regulated in cells and explain how genetic variations cause disruption of normal gene expression. Position frequency matrices (PFMs) are one of the most widely used models to represent such target sequences. However, up to now, for most species, only a small fraction of the transcription factors (TFs) have experimentally determined PFMs. Since biological experiments usually require much time and cost, it is strongly desired to develop computational methods with satisfied accuracies to speedup the progress. Here, a new method based on existing protein-DNA complex structures and the knowledgebase containing the preference of contacts between amino acids and nucleotides is proposed to predict quantitative specificities of protein-DNA interactions. When given a query protein sequence, a protein-DNA complex structure of homologues proteins is selected and the PFM prediction is made based on the selected template incorporated with the built knowledgebase.
The proposed method is evaluated by two datasets and compared with existing computational methods. It turns out that the proposed method can predict as well as the compared structure-based methods. On the other hand, when a sequence-based method that is trained by collected experimentally determined PFMs is compared, the proposed method performs slightly worse. Even though, the proposed method still has its value since different predictors usually have their own advantages and limitations. In summary, it is concluded that a DNA-binding protein’s binding preference can be predicted based on its primary structure using the complexes of its homologues. This facilitates related studies in the future because target sequences of proteins without a solved structure could be predicted now.
|
author2 |
歐陽彥正 |
author_facet |
歐陽彥正 Chih-Wei Lin 林志瑋 |
author |
Chih-Wei Lin 林志瑋 |
spellingShingle |
Chih-Wei Lin 林志瑋 Predicting target sequences of DNA-binding proteins based on primary structure |
author_sort |
Chih-Wei Lin |
title |
Predicting target sequences of DNA-binding proteins based on primary structure |
title_short |
Predicting target sequences of DNA-binding proteins based on primary structure |
title_full |
Predicting target sequences of DNA-binding proteins based on primary structure |
title_fullStr |
Predicting target sequences of DNA-binding proteins based on primary structure |
title_full_unstemmed |
Predicting target sequences of DNA-binding proteins based on primary structure |
title_sort |
predicting target sequences of dna-binding proteins based on primary structure |
publishDate |
2011 |
url |
http://ndltd.ncl.edu.tw/handle/79248267805763094339 |
work_keys_str_mv |
AT chihweilin predictingtargetsequencesofdnabindingproteinsbasedonprimarystructure AT línzhìwěi predictingtargetsequencesofdnabindingproteinsbasedonprimarystructure AT chihweilin cóngyījíjiégòuyùcèdnajiéhédànbáizhībiāodexùliè AT línzhìwěi cóngyījíjiégòuyùcèdnajiéhédànbáizhībiāodexùliè |
_version_ |
1718091997694132224 |