An Empirical Study of Korean Sentence Representation with Various Tokenizations

It is important how the token unit is defined in a sentence in natural language process tasks, such as text classification, machine translation, and generation. Many studies recently utilized the subword tokenization in language models such as BERT, KoBERT, and ALBERT. Although these language models...

Full description

Bibliographic Details
Main Authors: Danbi Cho, Hyunyoung Lee, Seungshik Kang
Format: Article
Language:English
Published: MDPI AG 2021-04-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/10/7/845