Constructing K562 transcriptional binding profiles using deep learning

碩士 === 國立臺灣大學 === 生物產業機電工程學研究所 === 106 === The study of gene regulation has a wide range of biological significance and is an important research topic in the field of genetics and molecular biology. Gene regulation results in differences between different species, or the same species but different i...

Full description

Bibliographic Details
Main Authors: Rou-An Shen, 沈柔安
Other Authors: 陳倩瑜
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/n9e3gx
Description
Summary:碩士 === 國立臺灣大學 === 生物產業機電工程學研究所 === 106 === The study of gene regulation has a wide range of biological significance and is an important research topic in the field of genetics and molecular biology. Gene regulation results in differences between different species, or the same species but different individuals. It can be said is the mechanism that controls gene expression in organisms. Among different kinds of gene regulation activities, many studies focus on the activation or inhibition of a nearby gene resulting from the interaction between transcription factors and their binding sites. This thesis uses a deep learning framework, to evaluate the morphology of transcription factor binding sites on specific cells and improve the prediction accuracy by feature learning. Moreover, a database was established based on this model for easy access of the data. This thesis uses the chromatin immunoprecipitation sequencing (ChIP-seq) data from the ENCODE database for analysis, and selects K562 cell lines for learning and prediction. Chromatin immunoprecipitation sequencing specialized in finding the binding site of a particular protein on the human DNA fragment, and then it’s gene regulation can be observed. In this thesis, by using the chromosome immunoprecipitation sequencing data and the prediction model of multiple sets of deep learning convolutional neural networks for K562 cell lines, the model used in this study can accurately predict the transcription factor binding site of a specific cell, and the effect of sequence variation on transcription factor binding affinity. Furthermore, the establishment of the database helps the users to save time on comparing the transcription factor binding features with the whole genome, and can the input the position of a specific chromosome to query the transcription factors that may affect the gene regulation results. And as an important step in the detection of disease-specific gene, the results of this thesis will serve as an important basis for future research and application of bioinformatics related to transcription factor binding sites.