Web Data ETL System with Unsupervised Extraction
碩士 === 國立中央大學 === 軟體工程研究所 === 106 === Web is the most important and primary way for fetching information nowadays, especially in deep web. In web data extraction, the page level approach compared with the record level approach is a more comprehensive solution because it can generate more complete pa...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2018
|
Online Access: | http://ndltd.ncl.edu.tw/handle/d5yn85 |
id |
ndltd-TW-106NCU05392131 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-106NCU053921312019-11-28T05:22:16Z http://ndltd.ncl.edu.tw/handle/d5yn85 Web Data ETL System with Unsupervised Extraction 非監督式網頁資料擷取、轉置、載入與輸出系統 Yu-An Chou 周昱安 碩士 國立中央大學 軟體工程研究所 106 Web is the most important and primary way for fetching information nowadays, especially in deep web. In web data extraction, the page level approach compared with the record level approach is a more comprehensive solution because it can generate more complete page schema for extracting all the data of page. Otherwise, most research of web data extraction is focusing on algorithm of schema induction or extraction, instead of user-end service. Therefore, the research of this paper provide a ETL(extract-transform-load) system with automated crawler which base on unsupervised extraction. The users can extract and output (e.g. API endpoint, static export) web data by user-friend GUI, without any programming. Hoping the research can simplify the management of the entire complex process and bring convenience web data extraction to the general public. Chia-Hui Chang 張嘉惠 2018 學位論文 ; thesis 37 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中央大學 === 軟體工程研究所 === 106 === Web is the most important and primary way for fetching information nowadays, especially in deep web. In web data extraction, the page level approach compared with the record level approach is a more comprehensive solution because it can generate more complete page schema for extracting all the data of page.
Otherwise, most research of web data extraction is focusing on algorithm of schema induction or extraction, instead of user-end service. Therefore, the research of this paper provide a ETL(extract-transform-load) system with automated crawler which base on unsupervised extraction. The users can extract and output (e.g. API endpoint, static export) web data by user-friend GUI, without any programming. Hoping the research can simplify the management of the entire complex process and bring convenience web data extraction to the general public.
|
author2 |
Chia-Hui Chang |
author_facet |
Chia-Hui Chang Yu-An Chou 周昱安 |
author |
Yu-An Chou 周昱安 |
spellingShingle |
Yu-An Chou 周昱安 Web Data ETL System with Unsupervised Extraction |
author_sort |
Yu-An Chou |
title |
Web Data ETL System with Unsupervised Extraction |
title_short |
Web Data ETL System with Unsupervised Extraction |
title_full |
Web Data ETL System with Unsupervised Extraction |
title_fullStr |
Web Data ETL System with Unsupervised Extraction |
title_full_unstemmed |
Web Data ETL System with Unsupervised Extraction |
title_sort |
web data etl system with unsupervised extraction |
publishDate |
2018 |
url |
http://ndltd.ncl.edu.tw/handle/d5yn85 |
work_keys_str_mv |
AT yuanchou webdataetlsystemwithunsupervisedextraction AT zhōuyùān webdataetlsystemwithunsupervisedextraction AT yuanchou fēijiāndūshìwǎngyèzīliàoxiéqǔzhuǎnzhìzàirùyǔshūchūxìtǒng AT zhōuyùān fēijiāndūshìwǎngyèzīliàoxiéqǔzhuǎnzhìzàirùyǔshūchūxìtǒng |
_version_ |
1719297850403192832 |