Identification of biomedical entities from Medline abstracts using a dictionary-based approach
The aim of this paper was to develop a system for identification of biomedical entities, such as protein and gene names, from a corpora of Medline abstracts. Another aim was to manage to extract the most relevant terms from the set of identified biomedical terms and make them readily presentable for...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap
2005
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9242 |
id |
ndltd-UPSALLA1-oai-DiVA.org-ntnu-9242 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-ntnu-92422013-01-08T13:26:31ZIdentification of biomedical entities from Medline abstracts using a dictionary-based approachengSkuland, MagnusNorges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskapInstitutt for datateknikk og informasjonsvitenskap2005ntnudaimSIF2 datateknikkProgram- og informasjonssystemerThe aim of this paper was to develop a system for identification of biomedical entities, such as protein and gene names, from a corpora of Medline abstracts. Another aim was to manage to extract the most relevant terms from the set of identified biomedical terms and make them readily presentable for an end-user. The developed prototype, named iMasterThesis, uses a dictionary-based approach to the problem. A dictionary, consisting of 21K gene names and 425K protein names, was constructed in an automatic fashion. With the realization of the protein name dictionary as a multi-level tree structure of hash tables, the approach tries to facilitate a more flexible and relaxed matching scheme than previous approaches. The system was evaluated against a golden standard consisting of 101 expert-annotated Medline abstracts. It is capable of identifying protein and gene names from these abstracts with a 10% recall and 14% precision. It seems clear that for further improvements of the obtained results, the quality of the dictionary needs to be increased, possibly through manual inspection by domain experts. A graphical user interface, presenting an end-user with the most relevant terms identified, has been developed as well. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9242Local ntnudaim:1056application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
ntnudaim SIF2 datateknikk Program- og informasjonssystemer |
spellingShingle |
ntnudaim SIF2 datateknikk Program- og informasjonssystemer Skuland, Magnus Identification of biomedical entities from Medline abstracts using a dictionary-based approach |
description |
The aim of this paper was to develop a system for identification of biomedical entities, such as protein and gene names, from a corpora of Medline abstracts. Another aim was to manage to extract the most relevant terms from the set of identified biomedical terms and make them readily presentable for an end-user. The developed prototype, named iMasterThesis, uses a dictionary-based approach to the problem. A dictionary, consisting of 21K gene names and 425K protein names, was constructed in an automatic fashion. With the realization of the protein name dictionary as a multi-level tree structure of hash tables, the approach tries to facilitate a more flexible and relaxed matching scheme than previous approaches. The system was evaluated against a golden standard consisting of 101 expert-annotated Medline abstracts. It is capable of identifying protein and gene names from these abstracts with a 10% recall and 14% precision. It seems clear that for further improvements of the obtained results, the quality of the dictionary needs to be increased, possibly through manual inspection by domain experts. A graphical user interface, presenting an end-user with the most relevant terms identified, has been developed as well. |
author |
Skuland, Magnus |
author_facet |
Skuland, Magnus |
author_sort |
Skuland, Magnus |
title |
Identification of biomedical entities from Medline abstracts using a dictionary-based approach |
title_short |
Identification of biomedical entities from Medline abstracts using a dictionary-based approach |
title_full |
Identification of biomedical entities from Medline abstracts using a dictionary-based approach |
title_fullStr |
Identification of biomedical entities from Medline abstracts using a dictionary-based approach |
title_full_unstemmed |
Identification of biomedical entities from Medline abstracts using a dictionary-based approach |
title_sort |
identification of biomedical entities from medline abstracts using a dictionary-based approach |
publisher |
Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap |
publishDate |
2005 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9242 |
work_keys_str_mv |
AT skulandmagnus identificationofbiomedicalentitiesfrommedlineabstractsusingadictionarybasedapproach |
_version_ |
1716520490840883200 |