A hybrid approach to Pali Sandhi segmentation using BiLSTM and rule-based analysis

Pali Sandhi is a phonetic transformation from two words into a new word. The phonemes of the neighbouring words are changed and merged. Pali Sandhi word segmentation is more challenging than Thai word segmentation because Pali is a highly inflected language. This study proposes a novel approach that...

Full description

Bibliographic Details
Main Authors: Klangjai Tammanam, Nuttachot Promrit, Sajjaporn Waijanya
Format: Article
Language:English
Published: Khon Kaen University 2021-07-01
Series:Engineering and Applied Science Research
Subjects:
Online Access:https://ph01.tci-thaijo.org/index.php/easr/article/download/243815/166489/
id doaj-4314028e98404979931b179463334755
record_format Article
spelling doaj-4314028e98404979931b1794633347552021-07-12T04:17:45ZengKhon Kaen UniversityEngineering and Applied Science Research2539-61612539-62182021-07-01485614626A hybrid approach to Pali Sandhi segmentation using BiLSTM and rule-based analysisKlangjai TammanamNuttachot PromritSajjaporn WaijanyaPali Sandhi is a phonetic transformation from two words into a new word. The phonemes of the neighbouring words are changed and merged. Pali Sandhi word segmentation is more challenging than Thai word segmentation because Pali is a highly inflected language. This study proposes a novel approach that predicts splitting locations by classifying the sample Sandhi words into five classes with a bidirectional long short-term memory model. We applied the classified rules to rectify the words from the splitting locations. We identified 6,345 Pali Sandhi words from Dhammapada Atthakatha. We evaluated the performance of our proposed model on the basis of the accuracy of the splitting locations and compared the results with the dataset. Results showed that 92.20% of the splitting locations were correct, 1.10% of the Pali Sandhi words were predicted as non-splitting location words and 5.83% were not matched with the answers (incomplete segmentation).https://ph01.tci-thaijo.org/index.php/easr/article/download/243815/166489/bilstmpali sandhithai palirule basepali sandhi splitting
collection DOAJ
language English
format Article
sources DOAJ
author Klangjai Tammanam
Nuttachot Promrit
Sajjaporn Waijanya
spellingShingle Klangjai Tammanam
Nuttachot Promrit
Sajjaporn Waijanya
A hybrid approach to Pali Sandhi segmentation using BiLSTM and rule-based analysis
Engineering and Applied Science Research
bilstm
pali sandhi
thai pali
rule base
pali sandhi splitting
author_facet Klangjai Tammanam
Nuttachot Promrit
Sajjaporn Waijanya
author_sort Klangjai Tammanam
title A hybrid approach to Pali Sandhi segmentation using BiLSTM and rule-based analysis
title_short A hybrid approach to Pali Sandhi segmentation using BiLSTM and rule-based analysis
title_full A hybrid approach to Pali Sandhi segmentation using BiLSTM and rule-based analysis
title_fullStr A hybrid approach to Pali Sandhi segmentation using BiLSTM and rule-based analysis
title_full_unstemmed A hybrid approach to Pali Sandhi segmentation using BiLSTM and rule-based analysis
title_sort hybrid approach to pali sandhi segmentation using bilstm and rule-based analysis
publisher Khon Kaen University
series Engineering and Applied Science Research
issn 2539-6161
2539-6218
publishDate 2021-07-01
description Pali Sandhi is a phonetic transformation from two words into a new word. The phonemes of the neighbouring words are changed and merged. Pali Sandhi word segmentation is more challenging than Thai word segmentation because Pali is a highly inflected language. This study proposes a novel approach that predicts splitting locations by classifying the sample Sandhi words into five classes with a bidirectional long short-term memory model. We applied the classified rules to rectify the words from the splitting locations. We identified 6,345 Pali Sandhi words from Dhammapada Atthakatha. We evaluated the performance of our proposed model on the basis of the accuracy of the splitting locations and compared the results with the dataset. Results showed that 92.20% of the splitting locations were correct, 1.10% of the Pali Sandhi words were predicted as non-splitting location words and 5.83% were not matched with the answers (incomplete segmentation).
topic bilstm
pali sandhi
thai pali
rule base
pali sandhi splitting
url https://ph01.tci-thaijo.org/index.php/easr/article/download/243815/166489/
work_keys_str_mv AT klangjaitammanam ahybridapproachtopalisandhisegmentationusingbilstmandrulebasedanalysis
AT nuttachotpromrit ahybridapproachtopalisandhisegmentationusingbilstmandrulebasedanalysis
AT sajjapornwaijanya ahybridapproachtopalisandhisegmentationusingbilstmandrulebasedanalysis
AT klangjaitammanam hybridapproachtopalisandhisegmentationusingbilstmandrulebasedanalysis
AT nuttachotpromrit hybridapproachtopalisandhisegmentationusingbilstmandrulebasedanalysis
AT sajjapornwaijanya hybridapproachtopalisandhisegmentationusingbilstmandrulebasedanalysis
_version_ 1721307886050607104