Summary: | 碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 92 === With the increasing popularity of Internet, the worldwide biological research institutions are able to publish their works electronically, resulting in the fast growing of online biomedical document. Yet, the vast amount of information available has hindered scientists and researchers from efficiently discovering significant knowledge such as gene function, protein-protein interactions, biological pathway, etc. from biomedical literatures. In this thesis, we propose a methodology, combining Information Extraction (IE) and classifier, to identify important gene function information through the filtering and extraction of gene and/or protein function annotations from the unstructured biomedical documents.
The strategy proposed in this paper is comprised of two independent components: classification and information extraction. The Naïve Bayes method was adopted to identify function sentences according to the feature list created in the previous phase, and it classifies every sentence candidates into “accepted or rejected”. Only “accepted” candidates were considered having been annotated. The information extraction that key mission of this process is to merge the repeated function information to a unique information and to identify new function information by natural language process and Knowledge Base.
|