Identification of biomedical entities from Medline abstracts using a dictionary-based approach

The aim of this paper was to develop a system for identification of biomedical entities, such as protein and gene names, from a corpora of Medline abstracts. Another aim was to manage to extract the most relevant terms from the set of identified biomedical terms and make them readily presentable for...

Full description

Bibliographic Details
Main Author:	Skuland, Magnus
Format:	Others
Language:	English
Published:	Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap 2005
Subjects:	ntnudaim SIF2 datateknikk Program- og informasjonssystemer
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9242

id	ndltd-UPSALLA1-oai-DiVA.org-ntnu-9242
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-ntnu-92422013-01-08T13:26:31ZIdentification of biomedical entities from Medline abstracts using a dictionary-based approachengSkuland, MagnusNorges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskapInstitutt for datateknikk og informasjonsvitenskap2005ntnudaimSIF2 datateknikkProgram- og informasjonssystemerThe aim of this paper was to develop a system for identification of biomedical entities, such as protein and gene names, from a corpora of Medline abstracts. Another aim was to manage to extract the most relevant terms from the set of identified biomedical terms and make them readily presentable for an end-user. The developed prototype, named iMasterThesis, uses a dictionary-based approach to the problem. A dictionary, consisting of 21K gene names and 425K protein names, was constructed in an automatic fashion. With the realization of the protein name dictionary as a multi-level tree structure of hash tables, the approach tries to facilitate a more flexible and relaxed matching scheme than previous approaches. The system was evaluated against a golden standard consisting of 101 expert-annotated Medline abstracts. It is capable of identifying protein and gene names from these abstracts with a 10% recall and 14% precision. It seems clear that for further improvements of the obtained results, the quality of the dictionary needs to be increased, possibly through manual inspection by domain experts. A graphical user interface, presenting an end-user with the most relevant terms identified, has been developed as well. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9242Local ntnudaim:1056application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	ntnudaim SIF2 datateknikk Program- og informasjonssystemer
spellingShingle	ntnudaim SIF2 datateknikk Program- og informasjonssystemer Skuland, Magnus Identification of biomedical entities from Medline abstracts using a dictionary-based approach
description	The aim of this paper was to develop a system for identification of biomedical entities, such as protein and gene names, from a corpora of Medline abstracts. Another aim was to manage to extract the most relevant terms from the set of identified biomedical terms and make them readily presentable for an end-user. The developed prototype, named iMasterThesis, uses a dictionary-based approach to the problem. A dictionary, consisting of 21K gene names and 425K protein names, was constructed in an automatic fashion. With the realization of the protein name dictionary as a multi-level tree structure of hash tables, the approach tries to facilitate a more flexible and relaxed matching scheme than previous approaches. The system was evaluated against a golden standard consisting of 101 expert-annotated Medline abstracts. It is capable of identifying protein and gene names from these abstracts with a 10% recall and 14% precision. It seems clear that for further improvements of the obtained results, the quality of the dictionary needs to be increased, possibly through manual inspection by domain experts. A graphical user interface, presenting an end-user with the most relevant terms identified, has been developed as well.
author	Skuland, Magnus
author_facet	Skuland, Magnus
author_sort	Skuland, Magnus
title	Identification of biomedical entities from Medline abstracts using a dictionary-based approach
title_short	Identification of biomedical entities from Medline abstracts using a dictionary-based approach
title_full	Identification of biomedical entities from Medline abstracts using a dictionary-based approach
title_fullStr	Identification of biomedical entities from Medline abstracts using a dictionary-based approach
title_full_unstemmed	Identification of biomedical entities from Medline abstracts using a dictionary-based approach
title_sort	identification of biomedical entities from medline abstracts using a dictionary-based approach
publisher	Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap
publishDate	2005
url	http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9242
work_keys_str_mv	AT skulandmagnus identificationofbiomedicalentitiesfrommedlineabstractsusingadictionarybasedapproach
_version_	1716520490840883200

Identification of biomedical entities from Medline abstracts using a dictionary-based approach

Similar Items