|
|
|
|
LEADER |
02127 am a22002413u 4500 |
001 |
143770 |
042 |
|
|
|a dc
|
100 |
1 |
0 |
|a Tang, Nan
|e author
|
700 |
1 |
0 |
|a Fan, Ju
|e author
|
700 |
1 |
0 |
|a Li, Fangyi
|e author
|
700 |
1 |
0 |
|a Tu, Jianhong
|e author
|
700 |
1 |
0 |
|a Du, Xiaoyong
|e author
|
700 |
1 |
0 |
|a Li, Guoliang
|e author
|
700 |
1 |
0 |
|a Madden, Sam
|e author
|
700 |
1 |
0 |
|a Ouzzani, Mourad
|e author
|
245 |
0 |
0 |
|a RPT: relational pre-trained transformer is almost all you need towards democratizing data preparation
|
260 |
|
|
|b VLDB Endowment,
|c 2022-07-15T16:13:16Z.
|
856 |
|
|
|z Get fulltext
|u https://hdl.handle.net/1721.1/143770
|
520 |
|
|
|a <jats:p> <jats:italic>Can AI help automate human-easy but computer-hard data preparation tasks that burden data scientists, practitioners, and crowd workers?</jats:italic> We answer this question by presenting RPT, a denoising autoencoder for <jats:italic>tuple-to-X</jats:italic> models (" <jats:italic>X</jats:italic> " could be tuple, token, label, JSON, and so on). RPT is pre-trained for a <jats:italic>tuple-to-tuple</jats:italic> model by corrupting the input tuple and then learning a model to reconstruct the original tuple. It adopts a Transformer-based neural translation architecture that consists of a bidirectional encoder (similar to BERT) and a left-to-right autoregressive decoder (similar to GPT), leading to a generalization of both BERT and GPT. The pre-trained RPT can already support several common data preparation tasks such as data cleaning, auto-completion and schema matching. Better still, RPT can be fine-tuned on a wide range of data preparation tasks, such as value normalization, data transformation, data annotation, etc. To complement RPT, we also discuss several appealing techniques such as collaborative training and few-shot learning for entity resolution, and few-shot learning and NLP question-answering for information extraction. In addition, we identify a series of research opportunities to advance the field of data preparation. </jats:p>
|
546 |
|
|
|a en
|
655 |
7 |
|
|a Article
|
773 |
|
|
|t 10.14778/3457390.3457391
|
773 |
|
|
|t Proceedings of the VLDB Endowment
|