Exploiting non-redundant local patterns and probabilistic models for analyzing structured and semi-structured data
Main Author: | |
---|---|
Language: | English |
Published: |
The Ohio State University / OhioLINK
2008
|
Subjects: | |
Online Access: | http://rave.ohiolink.edu/etdc/view?acc_num=osu1199284713 |
id |
ndltd-OhioLink-oai-etd.ohiolink.edu-osu1199284713 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-OhioLink-oai-etd.ohiolink.edu-osu11992847132021-08-03T05:53:05Z Exploiting non-redundant local patterns and probabilistic models for analyzing structured and semi-structured data Wang, Chao Computer Science Probabilistic Graphical Model Undirected Graphical Model Markov Random Field Maximum Entropy Feature Selection Frequent Pattern Mining Non-Redundant Frequent Pattern Structured and Semi-Structure Data Social Network Analysis Link Prediction This work seeks to develop a probabilistic framework for modeling, querying and analyzing large-scale structured and semi-structured data. The framework has three components: (1) Mining non-redundant local patterns from data; (2) Gluing these local patterns together by employing probabilistic models (e.g., Markov random field (MRF), Bayesian network); and (3) Reasoning over the data for solving various data analysis tasks. Our contributions are as follows: (a) We present an approach of employing probabilistic models to identify non-redundant itemset patterns from a large collection of frequent itemsets on transactional data. Our approach can effectively eliminate redundancies from a large collection of itemset patterns. (b) We propose a technique of employing local probabilistic models to glue non-redundant itemset patterns together in tackling the link prediction task in co-authorship network analysis. Our technique effectively combines topology analysis on network structure data and frequency analysis on network event log data. The main idea is to consider the co-occurrence probability of two end nodes associated with a candidate link. We propose a method of building MRFs over local data regions to compute this co-occurrence probability. Experimental results demonstrate that the co-occurrence probability inferred from the local probabilistic models is very useful for link prediction. (c) We explore employing global models, models over large data regions, to glue non-redundant itemset patterns together. We investigate learning approximate global MRFs on large transactional data and propose a divide-and-conquer style modeling approach. Empirical study shows that the models are effective in modeling the data and approximately answering queries on the data. (d) We propose a technique of identifying non-redundant tree patterns from a large collection of structural tree patterns on semi-structured XML data. Our approach can effectively eliminate redundancies from a large collection of structural tree patterns. Furthermore, we present techniques of employing these non-redundant tree patterns as summary statistics for the XML data to solve the XML twig selection estimation problem. We propose a probabilistic framework under which the selectivity of a twig query can be estimated from the information of its subtrees. Empirical results demonstrate the efficacy of our approach on real and synthetic datasets. 2008-01-08 English text The Ohio State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=osu1199284713 http://rave.ohiolink.edu/etdc/view?acc_num=osu1199284713 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws. |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
topic |
Computer Science Probabilistic Graphical Model Undirected Graphical Model Markov Random Field Maximum Entropy Feature Selection Frequent Pattern Mining Non-Redundant Frequent Pattern Structured and Semi-Structure Data Social Network Analysis Link Prediction |
spellingShingle |
Computer Science Probabilistic Graphical Model Undirected Graphical Model Markov Random Field Maximum Entropy Feature Selection Frequent Pattern Mining Non-Redundant Frequent Pattern Structured and Semi-Structure Data Social Network Analysis Link Prediction Wang, Chao Exploiting non-redundant local patterns and probabilistic models for analyzing structured and semi-structured data |
author |
Wang, Chao |
author_facet |
Wang, Chao |
author_sort |
Wang, Chao |
title |
Exploiting non-redundant local patterns and probabilistic models for analyzing structured and semi-structured data |
title_short |
Exploiting non-redundant local patterns and probabilistic models for analyzing structured and semi-structured data |
title_full |
Exploiting non-redundant local patterns and probabilistic models for analyzing structured and semi-structured data |
title_fullStr |
Exploiting non-redundant local patterns and probabilistic models for analyzing structured and semi-structured data |
title_full_unstemmed |
Exploiting non-redundant local patterns and probabilistic models for analyzing structured and semi-structured data |
title_sort |
exploiting non-redundant local patterns and probabilistic models for analyzing structured and semi-structured data |
publisher |
The Ohio State University / OhioLINK |
publishDate |
2008 |
url |
http://rave.ohiolink.edu/etdc/view?acc_num=osu1199284713 |
work_keys_str_mv |
AT wangchao exploitingnonredundantlocalpatternsandprobabilisticmodelsforanalyzingstructuredandsemistructureddata |
_version_ |
1719427127948869632 |