RPT: relational pre-trained transformer is almost all you need towards democratizing data preparation

<jats:p> <jats:italic>Can AI help automate human-easy but computer-hard data preparation tasks that burden data scientists, practitioners, and crowd workers?</jats:italic> We answer this question by presenting RPT, a denoising autoencoder for <jats:italic>tuple-to-X</jats:...

Full description

Bibliographic Details
Main Authors:	Tang, Nan (Author), Fan, Ju (Author), Li, Fangyi (Author), Tu, Jianhong (Author), Du, Xiaoyong (Author), Li, Guoliang (Author), Madden, Sam (Author), Ouzzani, Mourad (Author)
Format:	Article
Language:	English
Published:	VLDB Endowment, 2022-07-15T16:13:16Z.
Subjects:	Article
Online Access:	Get fulltext


LEADER	02127 am a22002413u 4500
001	143770
042			\|a dc
100	1	0	\|a Tang, Nan \|e author
700	1	0	\|a Fan, Ju \|e author
700	1	0	\|a Li, Fangyi \|e author
700	1	0	\|a Tu, Jianhong \|e author
700	1	0	\|a Du, Xiaoyong \|e author
700	1	0	\|a Li, Guoliang \|e author
700	1	0	\|a Madden, Sam \|e author
700	1	0	\|a Ouzzani, Mourad \|e author
245	0	0	\|a RPT: relational pre-trained transformer is almost all you need towards democratizing data preparation
260			\|b VLDB Endowment, \|c 2022-07-15T16:13:16Z.
856			\|z Get fulltext \|u https://hdl.handle.net/1721.1/143770
520			\|a <jats:p> <jats:italic>Can AI help automate human-easy but computer-hard data preparation tasks that burden data scientists, practitioners, and crowd workers?</jats:italic> We answer this question by presenting RPT, a denoising autoencoder for <jats:italic>tuple-to-X</jats:italic> models (" <jats:italic>X</jats:italic> " could be tuple, token, label, JSON, and so on). RPT is pre-trained for a <jats:italic>tuple-to-tuple</jats:italic> model by corrupting the input tuple and then learning a model to reconstruct the original tuple. It adopts a Transformer-based neural translation architecture that consists of a bidirectional encoder (similar to BERT) and a left-to-right autoregressive decoder (similar to GPT), leading to a generalization of both BERT and GPT. The pre-trained RPT can already support several common data preparation tasks such as data cleaning, auto-completion and schema matching. Better still, RPT can be fine-tuned on a wide range of data preparation tasks, such as value normalization, data transformation, data annotation, etc. To complement RPT, we also discuss several appealing techniques such as collaborative training and few-shot learning for entity resolution, and few-shot learning and NLP question-answering for information extraction. In addition, we identify a series of research opportunities to advance the field of data preparation. </jats:p>
546			\|a en
655	7		\|a Article
773			\|t 10.14778/3457390.3457391
773			\|t Proceedings of the VLDB Endowment

RPT: relational pre-trained transformer is almost all you need towards democratizing data preparation

Similar Items