Web Data ETL System with Unsupervised Extraction

碩士 === 國立中央大學 === 軟體工程研究所 === 106 === Web is the most important and primary way for fetching information nowadays, especially in deep web. In web data extraction, the page level approach compared with the record level approach is a more comprehensive solution because it can generate more complete pa...

Full description

Bibliographic Details
Main Authors:	Yu-An Chou, 周昱安
Other Authors:	Chia-Hui Chang
Format:	Others
Language:	zh-TW
Published:	2018
Online Access:	http://ndltd.ncl.edu.tw/handle/d5yn85

id	ndltd-TW-106NCU05392131
record_format	oai_dc
spelling	ndltd-TW-106NCU053921312019-11-28T05:22:16Z http://ndltd.ncl.edu.tw/handle/d5yn85 Web Data ETL System with Unsupervised Extraction 非監督式網頁資料擷取、轉置、載入與輸出系統 Yu-An Chou 周昱安碩士國立中央大學軟體工程研究所 106 Web is the most important and primary way for fetching information nowadays, especially in deep web. In web data extraction, the page level approach compared with the record level approach is a more comprehensive solution because it can generate more complete page schema for extracting all the data of page. Otherwise, most research of web data extraction is focusing on algorithm of schema induction or extraction, instead of user-end service. Therefore, the research of this paper provide a ETL(extract-transform-load) system with automated crawler which base on unsupervised extraction. The users can extract and output (e.g. API endpoint, static export) web data by user-friend GUI, without any programming. Hoping the research can simplify the management of the entire complex process and bring convenience web data extraction to the general public. Chia-Hui Chang 張嘉惠 2018 學位論文 ; thesis 37 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立中央大學 === 軟體工程研究所 === 106 === Web is the most important and primary way for fetching information nowadays, especially in deep web. In web data extraction, the page level approach compared with the record level approach is a more comprehensive solution because it can generate more complete page schema for extracting all the data of page. Otherwise, most research of web data extraction is focusing on algorithm of schema induction or extraction, instead of user-end service. Therefore, the research of this paper provide a ETL(extract-transform-load) system with automated crawler which base on unsupervised extraction. The users can extract and output (e.g. API endpoint, static export) web data by user-friend GUI, without any programming. Hoping the research can simplify the management of the entire complex process and bring convenience web data extraction to the general public.
author2	Chia-Hui Chang
author_facet	Chia-Hui Chang Yu-An Chou 周昱安
author	Yu-An Chou 周昱安
spellingShingle	Yu-An Chou 周昱安 Web Data ETL System with Unsupervised Extraction
author_sort	Yu-An Chou
title	Web Data ETL System with Unsupervised Extraction
title_short	Web Data ETL System with Unsupervised Extraction
title_full	Web Data ETL System with Unsupervised Extraction
title_fullStr	Web Data ETL System with Unsupervised Extraction
title_full_unstemmed	Web Data ETL System with Unsupervised Extraction
title_sort	web data etl system with unsupervised extraction
publishDate	2018
url	http://ndltd.ncl.edu.tw/handle/d5yn85
work_keys_str_mv	AT yuanchou webdataetlsystemwithunsupervisedextraction AT zhōuyùān webdataetlsystemwithunsupervisedextraction AT yuanchou fēijiāndūshìwǎngyèzīliàoxiéqǔzhuǎnzhìzàirùyǔshūchūxìtǒng AT zhōuyùān fēijiāndūshìwǎngyèzīliàoxiéqǔzhuǎnzhìzàirùyǔshūchūxìtǒng
_version_	1719297850403192832

Web Data ETL System with Unsupervised Extraction

Similar Items