A Lexicon for Gene Normalization

Researchers tend to use their own or favourite gene names in scientific literature, even though there are official names. Some names may even be used for more than one gene. This leads to problems with ambiguity when automatically mining biological literature. To disambiguate the gene names, gene no...

Full description

Bibliographic Details
Main Author:	Lingemark, Maria
Format:	Others
Language:	English
Published:	Linköpings universitet, Institutionen för datavetenskap 2009
Subjects:	Bioinformatics Gene Normalization String Matching Text Mining Bioinformatik
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-20250

Description
Summary:	Researchers tend to use their own or favourite gene names in scientific literature, even though there are official names. Some names may even be used for more than one gene. This leads to problems with ambiguity when automatically mining biological literature. To disambiguate the gene names, gene normalization is used. In this thesis, we look into an existing gene normalization system, and develop a new method to find gene candidates for the ambiguous genes. For the new method a lexicon is created, using information about the gene names, symbols and synonyms from three different databases. The gene mention found in the scientific literature is used as input for a search in this lexicon, and all genes in the lexicon that match the mention are returned as gene candidates for that mention. These candidates are then used in the system's disambiguation step. Results show that the new method gives a better over all result from the system, with an increase in precision and a small decrease in recall.

A Lexicon for Gene Normalization

Similar Items