An Empirical Study of Korean Sentence Representation with Various Tokenizations
It is important how the token unit is defined in a sentence in natural language process tasks, such as text classification, machine translation, and generation. Many studies recently utilized the subword tokenization in language models such as BERT, KoBERT, and ALBERT. Although these language models...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-04-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/10/7/845 |