A Comparative Study of Open-Domain and Specific-Domain Word Sense Disambiguation Based on Quranic Information Retrieval

Information retrieval is the process of analysing typed query as well as to retrieve relevant document according to the user query. Several issues can significantly affect the effectiveness of information retrieval. One of the common issue is the ambiguity lies on the words where a single word could...

Full description

Bibliographic Details
Main Authors: Hasan Abood Rehab, Tiun Sabrina
Format: Article
Language:English
Published: EDP Sciences 2017-01-01
Series:MATEC Web of Conferences
Online Access:https://doi.org/10.1051/matecconf/201713500071
Description
Summary:Information retrieval is the process of analysing typed query as well as to retrieve relevant document according to the user query. Several issues can significantly affect the effectiveness of information retrieval. One of the common issue is the ambiguity lies on the words where a single word could yield several meanings. The process of identifying the exact sense of word is called Word Sense Disambiguation (WSD). Quran is the holly book for nearly 1.5 billion Muslims around the world. In particularly, Quran contains numerous words that can undergone multiple meanings. Therefore, there is a vital demand to apply WSD approach on Quran, in order, to improve the information retrieval. Several WSD approaches have been proposed for Quranic retrieval. However, these approaches are divided into two main categories; open-domain WSD approach and specific-domain WSD approach. Open-domain WSD is an approach that utilizes an open-domain dictionary such as WordNet, that is exploited to provide the exact sense. Whereas, domain-specific WSD approach aims to utilize a restricted training data that contain specific senses related to the domain of Quran. Hence, this study aims to establish a comparative study to investigate the two WSD categories including domain-specific and open-domain. For the domain-specific approach, a predefined example data has been collected to train Yarwosky algorithm which is a semisupervised machine learning technique. Then, based on the training, such algorithm can classify the exact sense for the words. In contrast, WordNet which is an open-domain dictionary has been used in this study with semantic distances, in order, to identify the similarity between the query word and the results of WordNet’s concepts. That dataset that has been used in this study is a Quranic translation. The experimental results have shown the mixed superiority of Yarwosky algorithm and WordNet WSD approach.
ISSN:2261-236X