Rule-based Approach on Extraction of Malay Compound Nouns in Standard Malay Document

Malay compound noun is defined as a form of words that exists when two or more words are combined into a single syntax and it gives a specific meaning. Compound noun acts as one unit and it is spelled separately unless an established compound noun is written closely from two words. The basic charact...

Full description

Bibliographic Details
Main Authors: Bakar, Z.A (Author), Ismail, N.K (Author), Rawi, M.I.M (Author)
Format: Article
Language:English
Published: Institute of Physics Publishing 2017
Subjects:
Online Access:View Fulltext in Publisher
View in Scopus
LEADER 03128nas a2200277Ia 4500
001 10.1088-1757-899X-226-1-012106
008 220120c20179999CNT?? ? 0 0und d
020 |a 17578981 (ISSN) 
245 1 0 |a Rule-based Approach on Extraction of Malay Compound Nouns in Standard Malay Document 
260 0 |b Institute of Physics Publishing  |c 2017 
520 3 |a Malay compound noun is defined as a form of words that exists when two or more words are combined into a single syntax and it gives a specific meaning. Compound noun acts as one unit and it is spelled separately unless an established compound noun is written closely from two words. The basic characteristics of compound noun can be seen in the Malay sentences which are the frequency of that word in the text itself. Thus, this extraction of compound nouns is significant for the following research which is text summarization, grammar checker, sentiments analysis, machine translation and word categorization. There are many research efforts that have been proposed in extracting Malay compound noun using linguistic approaches. Most of the existing methods were done on the extraction of bi-gram noun+noun compound. However, the result still produces some problems as to give a better result. This paper explores a linguistic method for extracting compound Noun from stand Malay corpus. A standard dataset are used to provide a common platform for evaluating research on the recognition of compound Nouns in Malay sentences. Therefore, an improvement for the effectiveness of the compound noun extraction is needed because the result can be compromised. Thus, this study proposed a modification of linguistic approach in order to enhance the extraction of compound nouns processing. Several pre-processing steps are involved including normalization, tokenization and tagging. The first step that uses the linguistic approach in this study is Part-of-Speech (POS) tagging. Finally, we describe several rules-based and modify the rules to get the most relevant relation between the first word and the second word in order to assist us in solving of the problems. The effectiveness of the relations used in our study can be measured using recall, precision and F1-score techniques. The comparison of the baseline values is very essential because it can provide whether there has been an improvement in the result. © Published under licence by IOP Publishing Ltd. 
650 0 4 |a Basic characteristics 
650 0 4 |a Extracting compounds 
650 0 4 |a Extraction 
650 0 4 |a Linguistic approach 
650 0 4 |a Linguistics 
650 0 4 |a Machine translations 
650 0 4 |a Part of speech tagging 
650 0 4 |a Pre-processing step 
650 0 4 |a Relevant relations 
650 0 4 |a Rule-based approach 
700 1 0 |a Bakar, Z.A.  |e author 
700 1 0 |a Ismail, N.K.  |e author 
700 1 0 |a Rawi, M.I.M.  |e author 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1088/1757-899X/226/1/012106 
856 |z View in Scopus  |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-85028680673&doi=10.1088%2f1757-899X%2f226%2f1%2f012106&partnerID=40&md5=88dbf13ae59aadb278c78908876af079