Web Data ETL System with Unsupervised Extraction

碩士 === 國立中央大學 === 軟體工程研究所 === 106 === Web is the most important and primary way for fetching information nowadays, especially in deep web. In web data extraction, the page level approach compared with the record level approach is a more comprehensive solution because it can generate more complete pa...

Full description

Bibliographic Details
Main Authors: Yu-An Chou, 周昱安
Other Authors: Chia-Hui Chang
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/d5yn85
id ndltd-TW-106NCU05392131
record_format oai_dc
spelling ndltd-TW-106NCU053921312019-11-28T05:22:16Z http://ndltd.ncl.edu.tw/handle/d5yn85 Web Data ETL System with Unsupervised Extraction 非監督式網頁資料擷取、轉置、載入與輸出系統 Yu-An Chou 周昱安 碩士 國立中央大學 軟體工程研究所 106 Web is the most important and primary way for fetching information nowadays, especially in deep web. In web data extraction, the page level approach compared with the record level approach is a more comprehensive solution because it can generate more complete page schema for extracting all the data of page. Otherwise, most research of web data extraction is focusing on algorithm of schema induction or extraction, instead of user-end service. Therefore, the research of this paper provide a ETL(extract-transform-load) system with automated crawler which base on unsupervised extraction. The users can extract and output (e.g. API endpoint, static export) web data by user-friend GUI, without any programming. Hoping the research can simplify the management of the entire complex process and bring convenience web data extraction to the general public. Chia-Hui Chang 張嘉惠 2018 學位論文 ; thesis 37 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立中央大學 === 軟體工程研究所 === 106 === Web is the most important and primary way for fetching information nowadays, especially in deep web. In web data extraction, the page level approach compared with the record level approach is a more comprehensive solution because it can generate more complete page schema for extracting all the data of page. Otherwise, most research of web data extraction is focusing on algorithm of schema induction or extraction, instead of user-end service. Therefore, the research of this paper provide a ETL(extract-transform-load) system with automated crawler which base on unsupervised extraction. The users can extract and output (e.g. API endpoint, static export) web data by user-friend GUI, without any programming. Hoping the research can simplify the management of the entire complex process and bring convenience web data extraction to the general public.
author2 Chia-Hui Chang
author_facet Chia-Hui Chang
Yu-An Chou
周昱安
author Yu-An Chou
周昱安
spellingShingle Yu-An Chou
周昱安
Web Data ETL System with Unsupervised Extraction
author_sort Yu-An Chou
title Web Data ETL System with Unsupervised Extraction
title_short Web Data ETL System with Unsupervised Extraction
title_full Web Data ETL System with Unsupervised Extraction
title_fullStr Web Data ETL System with Unsupervised Extraction
title_full_unstemmed Web Data ETL System with Unsupervised Extraction
title_sort web data etl system with unsupervised extraction
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/d5yn85
work_keys_str_mv AT yuanchou webdataetlsystemwithunsupervisedextraction
AT zhōuyùān webdataetlsystemwithunsupervisedextraction
AT yuanchou fēijiāndūshìwǎngyèzīliàoxiéqǔzhuǎnzhìzàirùyǔshūchūxìtǒng
AT zhōuyùān fēijiāndūshìwǎngyèzīliàoxiéqǔzhuǎnzhìzàirùyǔshūchūxìtǒng
_version_ 1719297850403192832