Study on Development of a Chinese Gene Variation Database

碩士 === 國立陽明大學 === 醫學生物技術研究所 === 90 === Changes in the genetic information have been import research resource in evolution, genetic disease and the establishment of variation databases is critical for storage and exchange of variation information. Furthermore, it is essential for the study...

Full description

Bibliographic Details
Main Authors: Chien-Han Lin, 林千涵
Other Authors: Kwang-Jen Hsiao
Format: Others
Language:zh-TW
Published: 2002
Online Access:http://ndltd.ncl.edu.tw/handle/01465228866967776222
id ndltd-TW-090YM000604024
record_format oai_dc
spelling ndltd-TW-090YM0006040242016-06-24T04:15:13Z http://ndltd.ncl.edu.tw/handle/01465228866967776222 Study on Development of a Chinese Gene Variation Database 華人基因變異資料庫(CGVdb)建置之研究 Chien-Han Lin 林千涵 碩士 國立陽明大學 醫學生物技術研究所 90 Changes in the genetic information have been import research resource in evolution, genetic disease and the establishment of variation databases is critical for storage and exchange of variation information. Furthermore, it is essential for the study of functional genomics. Variation databases could be categorized to Locus-specific database (LSDB) for collecting variations for a few gene locus, Central Mutation & SNP Databases, such as Online Mendelian Inheritance in Man (OMIM) and Human Gene Mutation Database (HGMD), and the National & Ethnic Variation Databases,such as Arab Genetic Disease Database (AGDDB), for collecting variation from a specific nation or population. Chinese Gene Mutation Database (CGMD, http://cgmd.nhri.org.tw) is established by research team of Professor Kuang-Dong Wuu and Professor Kwang-Jen Hsiao in Institute of Genetics in National Ying-Ming University. It has been compiled to provide information of inherited gene mutations in Chinese population. The aim of this thesis is to establish a Chinese Gene Variation Database (CGVdb) on the basis of CGMD. The establishment of CGVdb followed the recommendations of Human Genome Variation Society (HGVS, http://www.hgvs.org), including Mutation Nomenclature Recommendations and Database Content Recommendations for minimum (core) content, essential and suggestive fields. The content source data of CGVdb are the Chinese Gene Variation Reports (CVRs) which “inherited” mutation and variation reports related to phenotype studied in Chinese population, not including somatic variations. The reports are collected from public available MEDLINE database PubMed (http://www.ncbi.nlm.nih.gov/PubMed/) established by National Library of Medicine (http://www.nlm.nih.gov). By composing the search field descriptions of PubMed and MeSH (Medical Subject Heading), we can customized a search string for semi-automated electronic data collection. We also established a “Text Analysis Standard Operation Procedure” (Text Analysis SOP) to create content of CGVdb from abstract data of CVRs. To evaluate the effectiveness of our electronic data collection method and Text Analysis SOP, we have established a “standard data set” containing true CVRs. By manual reviewing 6,942 papers published in 18 selected journals which includes 9 journals published in foreign region, 9 journals published in Chinese region, in the period of 1997 and 1998, we has found 115 CVRs. However, our electronic data collection method detected 266 papers from MEDLINE. 3 False Negatives (FN, rate : 0.026), and 154 False Positives (FP, rate : 0.579) were found. Using optimized search string, 782 reports were found in the period of 1997 and 1998 from searching all the papers contained in MEDLINE. Among them, 251 CVRs and 531 False Positives (FP rate : 0.679) were found. This result indicated that the total CVRs in the MEDLINE database from 1960 to 2001 can be estimated to be around 1,308 from the 4,076 reports detected by our electronic method on Aug. 28, 2002. The 251 CVRs under the Text Analysis SOP produced 502 Variation Records (V.R.). 43.2% ( 217/502 ) 的V.R. should be into full text process to the minimal filed information for CGVdb. 56.8% (285/502) passed Text Analysis SOP which could be online to CGVdb. The 285 V.R. undergone combination process of V.R. describing same variation produced 222 CGV records. It could be searched by gene name, gene symbol, phenotype or CGV ID and browsing by gene or phenotype via the CGVdb web site: http://www.CGVdb.org.tw. We also established a value-added table GeneLink for direct links to Bioinformatics databases, such as LocusLink, OMIM , HGMD, RefSeq, GDB, Swiss Prot, GeneLynx and Locus Specific Database. Kwang-Jen Hsiao Ann-Ping Tsou 蕭廣仁 鄒安平 2002 學位論文 ; thesis 129 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立陽明大學 === 醫學生物技術研究所 === 90 === Changes in the genetic information have been import research resource in evolution, genetic disease and the establishment of variation databases is critical for storage and exchange of variation information. Furthermore, it is essential for the study of functional genomics. Variation databases could be categorized to Locus-specific database (LSDB) for collecting variations for a few gene locus, Central Mutation & SNP Databases, such as Online Mendelian Inheritance in Man (OMIM) and Human Gene Mutation Database (HGMD), and the National & Ethnic Variation Databases,such as Arab Genetic Disease Database (AGDDB), for collecting variation from a specific nation or population. Chinese Gene Mutation Database (CGMD, http://cgmd.nhri.org.tw) is established by research team of Professor Kuang-Dong Wuu and Professor Kwang-Jen Hsiao in Institute of Genetics in National Ying-Ming University. It has been compiled to provide information of inherited gene mutations in Chinese population. The aim of this thesis is to establish a Chinese Gene Variation Database (CGVdb) on the basis of CGMD. The establishment of CGVdb followed the recommendations of Human Genome Variation Society (HGVS, http://www.hgvs.org), including Mutation Nomenclature Recommendations and Database Content Recommendations for minimum (core) content, essential and suggestive fields. The content source data of CGVdb are the Chinese Gene Variation Reports (CVRs) which “inherited” mutation and variation reports related to phenotype studied in Chinese population, not including somatic variations. The reports are collected from public available MEDLINE database PubMed (http://www.ncbi.nlm.nih.gov/PubMed/) established by National Library of Medicine (http://www.nlm.nih.gov). By composing the search field descriptions of PubMed and MeSH (Medical Subject Heading), we can customized a search string for semi-automated electronic data collection. We also established a “Text Analysis Standard Operation Procedure” (Text Analysis SOP) to create content of CGVdb from abstract data of CVRs. To evaluate the effectiveness of our electronic data collection method and Text Analysis SOP, we have established a “standard data set” containing true CVRs. By manual reviewing 6,942 papers published in 18 selected journals which includes 9 journals published in foreign region, 9 journals published in Chinese region, in the period of 1997 and 1998, we has found 115 CVRs. However, our electronic data collection method detected 266 papers from MEDLINE. 3 False Negatives (FN, rate : 0.026), and 154 False Positives (FP, rate : 0.579) were found. Using optimized search string, 782 reports were found in the period of 1997 and 1998 from searching all the papers contained in MEDLINE. Among them, 251 CVRs and 531 False Positives (FP rate : 0.679) were found. This result indicated that the total CVRs in the MEDLINE database from 1960 to 2001 can be estimated to be around 1,308 from the 4,076 reports detected by our electronic method on Aug. 28, 2002. The 251 CVRs under the Text Analysis SOP produced 502 Variation Records (V.R.). 43.2% ( 217/502 ) 的V.R. should be into full text process to the minimal filed information for CGVdb. 56.8% (285/502) passed Text Analysis SOP which could be online to CGVdb. The 285 V.R. undergone combination process of V.R. describing same variation produced 222 CGV records. It could be searched by gene name, gene symbol, phenotype or CGV ID and browsing by gene or phenotype via the CGVdb web site: http://www.CGVdb.org.tw. We also established a value-added table GeneLink for direct links to Bioinformatics databases, such as LocusLink, OMIM , HGMD, RefSeq, GDB, Swiss Prot, GeneLynx and Locus Specific Database.
author2 Kwang-Jen Hsiao
author_facet Kwang-Jen Hsiao
Chien-Han Lin
林千涵
author Chien-Han Lin
林千涵
spellingShingle Chien-Han Lin
林千涵
Study on Development of a Chinese Gene Variation Database
author_sort Chien-Han Lin
title Study on Development of a Chinese Gene Variation Database
title_short Study on Development of a Chinese Gene Variation Database
title_full Study on Development of a Chinese Gene Variation Database
title_fullStr Study on Development of a Chinese Gene Variation Database
title_full_unstemmed Study on Development of a Chinese Gene Variation Database
title_sort study on development of a chinese gene variation database
publishDate 2002
url http://ndltd.ncl.edu.tw/handle/01465228866967776222
work_keys_str_mv AT chienhanlin studyondevelopmentofachinesegenevariationdatabase
AT línqiānhán studyondevelopmentofachinesegenevariationdatabase
AT chienhanlin huárénjīyīnbiànyìzīliàokùcgvdbjiànzhìzhīyánjiū
AT línqiānhán huárénjīyīnbiànyìzīliàokùcgvdbjiànzhìzhīyánjiū
_version_ 1718321886713085952