An Empirical Study of Korean Sentence Representation with Various Tokenizations
It is important how the token unit is defined in a sentence in natural language process tasks, such as text classification, machine translation, and generation. Many studies recently utilized the subword tokenization in language models such as BERT, KoBERT, and ALBERT. Although these language models...
Main Authors: | Danbi Cho, Hyunyoung Lee, Seungshik Kang |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-04-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/10/7/845 |
Similar Items
-
Ancient Korean Neural Machine Translation
by: Chanjun Park, et al.
Published: (2020-01-01) -
Tokenization of Assets: Security Tokens in Liechtenstein and Switzerland
by: Angelika K. Layr
Published: (2021-09-01) -
Token Phenomenon in Participatory Architectural Design and Sulukule Urban Transformation as a Tokenism Example
by: Baharak Fareghi Bavilolyaei, et al.
Published: (2018-07-01) -
Word and Relation Embedding for Sentence Representation
Published: (2017) -
Low-Power Embedded DSP Core for Communication Systems
by: Tsao Ya-Lan, et al.
Published: (2003-01-01)