Analýzy reálných dat a jejich využití

Title: Analyses of Real-World Data and Their Exploitation Author: Mgr. Jakub Stárka Department: Department of Software Engineering Supervisor: RNDr. Irena Holubová, Ph.D. Abstract: The typical optimization strategy of many data processing techniques is ex- ploitation of the knowledge of constructs t...

Full description

Bibliographic Details
Main Author: Stárka, Jakub
Other Authors: Holubová, Irena
Format: Doctoral Thesis
Language:English
Published: 2013
Online Access:http://www.nusl.cz/ntk/nusl-327418
Description
Summary:Title: Analyses of Real-World Data and Their Exploitation Author: Mgr. Jakub Stárka Department: Department of Software Engineering Supervisor: RNDr. Irena Holubová, Ph.D. Abstract: The typical optimization strategy of many data processing techniques is ex- ploitation of the knowledge of constructs typically used in real-world applications. However, such approach requires a repeatable, updatable and detailed analysis of a rep- resentative data set. Having such a requirement a number of related problems arises, such as automatic crawling of the data, data extraction, schema inference, and efficient performance of analyses over a huge data volume as well as exploitation of the results in current applications. In this thesis we describe a complex framework for performing statistical analyses of real-world documents and we propose characteristics that appropriately capture and describe features of XML documents, RDF triples and XQuery queries. Additionally we provide experimental results over a few selected real-world data sets. Last but not least we introduce an easily extensible tool that enables one to implement, test and compare new modules of the XML schema inference process. We describe not only the framework, but the area of schema inference in general, including related work and open problems. Keywords:...