Rule-based Approach on Extraction of Malay Compound Nouns in Standard Malay Document

Malay compound noun is defined as a form of words that exists when two or more words are combined into a single syntax and it gives a specific meaning. Compound noun acts as one unit and it is spelled separately unless an established compound noun is written closely from two words. The basic charact...

Full description

Bibliographic Details
Main Authors: Abu Bakar, Z (Author), Ismail, NK (Author), Rawi, MIM (Author)
Format: Article
Language:English
Published: 2017
Online Access:View Fulltext in Publisher
LEADER 02471nam a2200133Ia 4500
001 10.1088-1757-899X-226-1-012106
008 220223s2017 CNT 000 0 und d
245 1 0 |a Rule-based Approach on Extraction of Malay Compound Nouns in Standard Malay Document 
260 0 |c 2017 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1088/1757-899X/226/1/012106 
520 3 |a Malay compound noun is defined as a form of words that exists when two or more words are combined into a single syntax and it gives a specific meaning. Compound noun acts as one unit and it is spelled separately unless an established compound noun is written closely from two words. The basic characteristics of compound noun can be seen in the Malay sentences which are the frequency of that word in the text itself. Thus, this extraction of compound nouns is significant for the following research which is text summarization, grammar checker, sentiments analysis, machine translation and word categorization. There are many research efforts that have been proposed in extracting Malay compound noun using linguistic approaches. Most of the existing methods were done on the extraction of bi-gram noun+noun compound. However, the result still produces some problems as to give a better result. This paper explores a linguistic method for extracting compound Noun from stand Malay corpus. A standard dataset are used to provide a common platform for evaluating research on the recognition of compound Nouns in Malay sentences. Therefore, an improvement for the effectiveness of the compound noun extraction is needed because the result can be compromised. Thus, this study proposed a modification of linguistic approach in order to enhance the extraction of compound nouns processing. Several pre-processing steps are involved including normalization, tokenization and tagging. The first step that uses the linguistic approach in this study is Part-of-Speech (POS) tagging. Finally, we describe several rules-based and modify the rules to get the most relevant relation between the first word and the second word in order to assist us in solving of the problems. The effectiveness of the relations used in our study can be measured using recall, precision and Fl-score techniques. The comparison of the baseline values is very essential because it can provide whether there has been an improvement in the result. 
700 1 0 |a Abu Bakar, Z  |e author 
700 1 0 |a Ismail, NK  |e author 
700 1 0 |a Rawi, MIM  |e author