id ndltd-OhioLink-oai-etd.ohiolink.edu-osu1199284713
record_format oai_dc
spelling ndltd-OhioLink-oai-etd.ohiolink.edu-osu11992847132021-08-03T05:53:05Z Exploiting non-redundant local patterns and probabilistic models for analyzing structured and semi-structured data Wang, Chao Computer Science Probabilistic Graphical Model Undirected Graphical Model Markov Random Field Maximum Entropy Feature Selection Frequent Pattern Mining Non-Redundant Frequent Pattern Structured and Semi-Structure Data Social Network Analysis Link Prediction This work seeks to develop a probabilistic framework for modeling, querying and analyzing large-scale structured and semi-structured data. The framework has three components: (1) Mining non-redundant local patterns from data; (2) Gluing these local patterns together by employing probabilistic models (e.g., Markov random field (MRF), Bayesian network); and (3) Reasoning over the data for solving various data analysis tasks. Our contributions are as follows: (a) We present an approach of employing probabilistic models to identify non-redundant itemset patterns from a large collection of frequent itemsets on transactional data. Our approach can effectively eliminate redundancies from a large collection of itemset patterns. (b) We propose a technique of employing local probabilistic models to glue non-redundant itemset patterns together in tackling the link prediction task in co-authorship network analysis. Our technique effectively combines topology analysis on network structure data and frequency analysis on network event log data. The main idea is to consider the co-occurrence probability of two end nodes associated with a candidate link. We propose a method of building MRFs over local data regions to compute this co-occurrence probability. Experimental results demonstrate that the co-occurrence probability inferred from the local probabilistic models is very useful for link prediction. (c) We explore employing global models, models over large data regions, to glue non-redundant itemset patterns together. We investigate learning approximate global MRFs on large transactional data and propose a divide-and-conquer style modeling approach. Empirical study shows that the models are effective in modeling the data and approximately answering queries on the data. (d) We propose a technique of identifying non-redundant tree patterns from a large collection of structural tree patterns on semi-structured XML data. Our approach can effectively eliminate redundancies from a large collection of structural tree patterns. Furthermore, we present techniques of employing these non-redundant tree patterns as summary statistics for the XML data to solve the XML twig selection estimation problem. We propose a probabilistic framework under which the selectivity of a twig query can be estimated from the information of its subtrees. Empirical results demonstrate the efficacy of our approach on real and synthetic datasets. 2008-01-08 English text The Ohio State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=osu1199284713 http://rave.ohiolink.edu/etdc/view?acc_num=osu1199284713 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws.
collection NDLTD
language English
sources NDLTD
topic Computer Science
Probabilistic Graphical Model
Undirected Graphical Model
Markov Random Field
Maximum Entropy
Feature Selection
Frequent Pattern Mining
Non-Redundant Frequent Pattern
Structured and Semi-Structure Data
Social Network Analysis
Link Prediction
spellingShingle Computer Science
Probabilistic Graphical Model
Undirected Graphical Model
Markov Random Field
Maximum Entropy
Feature Selection
Frequent Pattern Mining
Non-Redundant Frequent Pattern
Structured and Semi-Structure Data
Social Network Analysis
Link Prediction
Wang, Chao
Exploiting non-redundant local patterns and probabilistic models for analyzing structured and semi-structured data
author Wang, Chao
author_facet Wang, Chao
author_sort Wang, Chao
title Exploiting non-redundant local patterns and probabilistic models for analyzing structured and semi-structured data
title_short Exploiting non-redundant local patterns and probabilistic models for analyzing structured and semi-structured data
title_full Exploiting non-redundant local patterns and probabilistic models for analyzing structured and semi-structured data
title_fullStr Exploiting non-redundant local patterns and probabilistic models for analyzing structured and semi-structured data
title_full_unstemmed Exploiting non-redundant local patterns and probabilistic models for analyzing structured and semi-structured data
title_sort exploiting non-redundant local patterns and probabilistic models for analyzing structured and semi-structured data
publisher The Ohio State University / OhioLINK
publishDate 2008
url http://rave.ohiolink.edu/etdc/view?acc_num=osu1199284713
work_keys_str_mv AT wangchao exploitingnonredundantlocalpatternsandprobabilisticmodelsforanalyzingstructuredandsemistructureddata
_version_ 1719427127948869632