Materialization strategies for web based search computing applications

In the thesis we provide a characterization of view materialization in the context of multi domain heterogeneous search application. Web data view materialization is presented as a solution for technical constraints and problems implied by the unknown structure of the web data sources. The web data...

Full description

Bibliographic Details
Main Author: Zagorac, Srdan (Author)
Other Authors: Pears, Russel (Contributor)
Format: Others
Published: Auckland University of Technology, 2015-11-25T23:02:28Z.
Subjects:
Online Access:Get fulltext
LEADER 02802 am a22002413u 4500
001 9274
042 |a dc 
100 1 0 |a Zagorac, Srdan  |e author 
100 1 0 |a Pears, Russel  |e contributor 
245 0 0 |a Materialization strategies for web based search computing applications 
260 |b Auckland University of Technology,   |c 2015-11-25T23:02:28Z. 
520 |a In the thesis we provide a characterization of view materialization in the context of multi domain heterogeneous search application. Web data view materialization is presented as a solution for technical constraints and problems implied by the unknown structure of the web data sources. The web data materialization model extends the search computing (SeCo) multi-layered model, where the search services are registered in a service repository that describes the functional (e.g. invocation end-point, input and output attributes) information of data end-points. Our first research goal is to solve the problem of finding a sequence of access patterns, which when executed produces a materialization output. For the first research goal we make the following novel contributions: 1) Formulation of the building blocks for the materialization feasibility analysis; 2) Definition of the materialization feasibility analysis method and the accompanying algorithms; 3) A detailed empirical study conducted on a set of materialization tasks ranging in their schema dependency complexity. Our second research goal is the optimization of the materialization process so that the most optimal sequence in terms of materialization output efficiency and quality, executes at all times. For this goal we make the following novel contributions: 1) Formulation of a set of performance dimensions and their metrics for web source materialization; 2) A cost model that utilizes optimization metrics in order to qualitatively differentiate between web services in terms of materialization time; 3) A query optimization procedure that explores the characteristics of the underlying source data domain in order to prioritize the execution of the most productive queries in terms of their data harvesting power; 4) Materialization process optimization strategies based on the web source performance dimension metrics and query optimization procedure; 5) A detailed empirical study conducted on several relevant web based data sources that clearly shows the effectiveness of the proposed solution. 
540 |a OpenAccess 
546 |a en 
650 0 4 |a Multi-domain search 
650 0 4 |a Materialization feasibility analysis 
650 0 4 |a Materialization optimization 
650 0 4 |a Data surfacing 
650 0 4 |a Deep web mining 
650 0 4 |a Web data materialization 
650 0 4 |a Web data services 
655 7 |a Thesis 
856 |z Get fulltext  |u http://hdl.handle.net/10292/9274