Identification of biomedical entities from Medline abstracts using a dictionary-based approach

The aim of this paper was to develop a system for identification of biomedical entities, such as protein and gene names, from a corpora of Medline abstracts. Another aim was to manage to extract the most relevant terms from the set of identified biomedical terms and make them readily presentable for...

Full description

Bibliographic Details
Main Author: Skuland, Magnus
Format: Others
Language:English
Published: Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap 2005
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9242
id ndltd-UPSALLA1-oai-DiVA.org-ntnu-9242
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-ntnu-92422013-01-08T13:26:31ZIdentification of biomedical entities from Medline abstracts using a dictionary-based approachengSkuland, MagnusNorges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskapInstitutt for datateknikk og informasjonsvitenskap2005ntnudaimSIF2 datateknikkProgram- og informasjonssystemerThe aim of this paper was to develop a system for identification of biomedical entities, such as protein and gene names, from a corpora of Medline abstracts. Another aim was to manage to extract the most relevant terms from the set of identified biomedical terms and make them readily presentable for an end-user. The developed prototype, named iMasterThesis, uses a dictionary-based approach to the problem. A dictionary, consisting of 21K gene names and 425K protein names, was constructed in an automatic fashion. With the realization of the protein name dictionary as a multi-level tree structure of hash tables, the approach tries to facilitate a more flexible and relaxed matching scheme than previous approaches. The system was evaluated against a golden standard consisting of 101 expert-annotated Medline abstracts. It is capable of identifying protein and gene names from these abstracts with a 10% recall and 14% precision. It seems clear that for further improvements of the obtained results, the quality of the dictionary needs to be increased, possibly through manual inspection by domain experts. A graphical user interface, presenting an end-user with the most relevant terms identified, has been developed as well. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9242Local ntnudaim:1056application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic ntnudaim
SIF2 datateknikk
Program- og informasjonssystemer
spellingShingle ntnudaim
SIF2 datateknikk
Program- og informasjonssystemer
Skuland, Magnus
Identification of biomedical entities from Medline abstracts using a dictionary-based approach
description The aim of this paper was to develop a system for identification of biomedical entities, such as protein and gene names, from a corpora of Medline abstracts. Another aim was to manage to extract the most relevant terms from the set of identified biomedical terms and make them readily presentable for an end-user. The developed prototype, named iMasterThesis, uses a dictionary-based approach to the problem. A dictionary, consisting of 21K gene names and 425K protein names, was constructed in an automatic fashion. With the realization of the protein name dictionary as a multi-level tree structure of hash tables, the approach tries to facilitate a more flexible and relaxed matching scheme than previous approaches. The system was evaluated against a golden standard consisting of 101 expert-annotated Medline abstracts. It is capable of identifying protein and gene names from these abstracts with a 10% recall and 14% precision. It seems clear that for further improvements of the obtained results, the quality of the dictionary needs to be increased, possibly through manual inspection by domain experts. A graphical user interface, presenting an end-user with the most relevant terms identified, has been developed as well.
author Skuland, Magnus
author_facet Skuland, Magnus
author_sort Skuland, Magnus
title Identification of biomedical entities from Medline abstracts using a dictionary-based approach
title_short Identification of biomedical entities from Medline abstracts using a dictionary-based approach
title_full Identification of biomedical entities from Medline abstracts using a dictionary-based approach
title_fullStr Identification of biomedical entities from Medline abstracts using a dictionary-based approach
title_full_unstemmed Identification of biomedical entities from Medline abstracts using a dictionary-based approach
title_sort identification of biomedical entities from medline abstracts using a dictionary-based approach
publisher Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap
publishDate 2005
url http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9242
work_keys_str_mv AT skulandmagnus identificationofbiomedicalentitiesfrommedlineabstractsusingadictionarybasedapproach
_version_ 1716520490840883200