Building Data Civilizer Pipelines with an Advanced Workflow Engine

© 2018 IEEE. In order for an enterprise to gain insight into its internal business and the changing outside environment, it is essential to provide the relevant data for in-depth analysis. Enterprise data is usually scattered across departments and geographic regions and is often inconsistent. Data...

Full description

Bibliographic Details
Main Authors: Mansour, Essam (Author), Deng, Dong (Author), Fernandez, Raul Castro (Author), Qahtan, Abdulhakim A. (Author), Tao, Wenbo (Author), Abedjan, Ziawasch (Author), Elmagarmid, Ahmed (Author), Ilyas, Ihab F. (Author), Madden, Samuel (Author), Ouzzani, Mourad (Author), Stonebraker, Michael (Author), Tang, Nan (Author)
Format: Article
Language:English
Published: IEEE, 2021-11-09T13:26:50Z.
Subjects:
Online Access:Get fulltext
LEADER 01728 am a22002773u 4500
001 137857
042 |a dc 
100 1 0 |a Mansour, Essam  |e author 
700 1 0 |a Deng, Dong  |e author 
700 1 0 |a Fernandez, Raul Castro  |e author 
700 1 0 |a Qahtan, Abdulhakim A.  |e author 
700 1 0 |a Tao, Wenbo  |e author 
700 1 0 |a Abedjan, Ziawasch  |e author 
700 1 0 |a Elmagarmid, Ahmed  |e author 
700 1 0 |a Ilyas, Ihab F.  |e author 
700 1 0 |a Madden, Samuel  |e author 
700 1 0 |a Ouzzani, Mourad  |e author 
700 1 0 |a Stonebraker, Michael  |e author 
700 1 0 |a Tang, Nan  |e author 
245 0 0 |a Building Data Civilizer Pipelines with an Advanced Workflow Engine 
260 |b IEEE,   |c 2021-11-09T13:26:50Z. 
856 |z Get fulltext  |u https://hdl.handle.net/1721.1/137857 
520 |a © 2018 IEEE. In order for an enterprise to gain insight into its internal business and the changing outside environment, it is essential to provide the relevant data for in-depth analysis. Enterprise data is usually scattered across departments and geographic regions and is often inconsistent. Data scientists spend the majority of their time finding, preparing, integrating, and cleaning relevant data sets. Data Civilizer is an end-To-end data preparation system. In this paper, we present the complete system, focusing on our new workflow engine, a superior system for entity matching and consolidation, and new cleaning tools. Our workflow engine allows data scientists to author, execute and retrofit data preparation pipelines of different data discovery and cleaning services. Our end-To-end demo scenario is based on data from the MIT data warehouse and e-commerce data sets. 
546 |a en 
655 7 |a Article 
773 |t 10.1109/icde.2018.00184