Thai Spelling Correction and Word Normalization on Social Text Using a Two-Stage Pipeline With Neural Contextual Attention

Text correction systems (e.g., spell checkers) have been used to improve the quality of computerized text by detecting and correcting errors. However, the task of performing spelling correction and word normalization (text correction) for Thai social media text has remained largely unexplored. In th...

Full description

Bibliographic Details
Main Authors: Anuruth Lertpiya, Tawunrat Chalothorn, Ekapol Chuangsuwanich
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9145483/
id doaj-ad77cc203d494aa7a9164e9a59c7419d
record_format Article
spelling doaj-ad77cc203d494aa7a9164e9a59c7419d2021-03-30T04:39:11ZengIEEEIEEE Access2169-35362020-01-01813340313341910.1109/ACCESS.2020.30108289145483Thai Spelling Correction and Word Normalization on Social Text Using a Two-Stage Pipeline With Neural Contextual AttentionAnuruth Lertpiya0https://orcid.org/0000-0002-5699-8653Tawunrat Chalothorn1https://orcid.org/0000-0003-4154-8745Ekapol Chuangsuwanich2https://orcid.org/0000-0001-6104-4857Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, ThailandKasikorn Labs Co., Ltd., Kasikorn Business Technology Group, Nonthaburi, ThailandDepartment of Computer Engineering, Faculty of Engineering, Chula Intelligent and Complex Systems, Chulalongkorn University, Bangkok, ThailandText correction systems (e.g., spell checkers) have been used to improve the quality of computerized text by detecting and correcting errors. However, the task of performing spelling correction and word normalization (text correction) for Thai social media text has remained largely unexplored. In this paper, we investigated how current text correction systems perform on correcting errors and word variances in Thai social texts and propose a method designed for this task. We have found that currently available Thai text correction systems are insufficiently robust for correcting spelling errors and word variances, while the text correctors designed for English grammatical error correction suffer from overcorrections (text rewrites). Thus, we proposed a neural-based text corrector with a two-stage structure to alleviate issues of overcorrections while exploiting the benefits of a neural Seq2Seq corrector. Our method consists of a neural-based error detector and a Seq2Seq neural error corrector with contextual attention. This novel architecture allows the Seq2Seq network to produce corrections based on both the erroneous text and its context without the need for an end-to-end structure. Our method outperformed all the other evaluated text correction systems. When compared to the second-best result (copy-augmented transformer), our method further reduced the word error rate (WER) from 2.51% to 2.07%, improved the generalized language evaluation understanding (GLEU) score from 0.9409 to 0.9502 on the Thai text correction task, and improved the GLEU score from 0.7409 to 0.7539 on the English spelling correction task.https://ieeexplore.ieee.org/document/9145483/Natural language processingmachine learningartificial neural networkstext generationspelling correctiontext normalization
collection DOAJ
language English
format Article
sources DOAJ
author Anuruth Lertpiya
Tawunrat Chalothorn
Ekapol Chuangsuwanich
spellingShingle Anuruth Lertpiya
Tawunrat Chalothorn
Ekapol Chuangsuwanich
Thai Spelling Correction and Word Normalization on Social Text Using a Two-Stage Pipeline With Neural Contextual Attention
IEEE Access
Natural language processing
machine learning
artificial neural networks
text generation
spelling correction
text normalization
author_facet Anuruth Lertpiya
Tawunrat Chalothorn
Ekapol Chuangsuwanich
author_sort Anuruth Lertpiya
title Thai Spelling Correction and Word Normalization on Social Text Using a Two-Stage Pipeline With Neural Contextual Attention
title_short Thai Spelling Correction and Word Normalization on Social Text Using a Two-Stage Pipeline With Neural Contextual Attention
title_full Thai Spelling Correction and Word Normalization on Social Text Using a Two-Stage Pipeline With Neural Contextual Attention
title_fullStr Thai Spelling Correction and Word Normalization on Social Text Using a Two-Stage Pipeline With Neural Contextual Attention
title_full_unstemmed Thai Spelling Correction and Word Normalization on Social Text Using a Two-Stage Pipeline With Neural Contextual Attention
title_sort thai spelling correction and word normalization on social text using a two-stage pipeline with neural contextual attention
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description Text correction systems (e.g., spell checkers) have been used to improve the quality of computerized text by detecting and correcting errors. However, the task of performing spelling correction and word normalization (text correction) for Thai social media text has remained largely unexplored. In this paper, we investigated how current text correction systems perform on correcting errors and word variances in Thai social texts and propose a method designed for this task. We have found that currently available Thai text correction systems are insufficiently robust for correcting spelling errors and word variances, while the text correctors designed for English grammatical error correction suffer from overcorrections (text rewrites). Thus, we proposed a neural-based text corrector with a two-stage structure to alleviate issues of overcorrections while exploiting the benefits of a neural Seq2Seq corrector. Our method consists of a neural-based error detector and a Seq2Seq neural error corrector with contextual attention. This novel architecture allows the Seq2Seq network to produce corrections based on both the erroneous text and its context without the need for an end-to-end structure. Our method outperformed all the other evaluated text correction systems. When compared to the second-best result (copy-augmented transformer), our method further reduced the word error rate (WER) from 2.51% to 2.07%, improved the generalized language evaluation understanding (GLEU) score from 0.9409 to 0.9502 on the Thai text correction task, and improved the GLEU score from 0.7409 to 0.7539 on the English spelling correction task.
topic Natural language processing
machine learning
artificial neural networks
text generation
spelling correction
text normalization
url https://ieeexplore.ieee.org/document/9145483/
work_keys_str_mv AT anuruthlertpiya thaispellingcorrectionandwordnormalizationonsocialtextusingatwostagepipelinewithneuralcontextualattention
AT tawunratchalothorn thaispellingcorrectionandwordnormalizationonsocialtextusingatwostagepipelinewithneuralcontextualattention
AT ekapolchuangsuwanich thaispellingcorrectionandwordnormalizationonsocialtextusingatwostagepipelinewithneuralcontextualattention
_version_ 1724181465518833664