Learning on Graphs with Partially Absorbing Random Walks: Theory and Practice

Learning on graphs has been studied for decades with abundant models proposed, yet many of their behaviors and relations remain unclear. This thesis fills this gap by introducing a novel second-order Markov chain, called partially absorbing random walks (ParWalk). Different from ordinary random walk...

Full description

Bibliographic Details
Main Author:	Wu, Xiaoming
Language:	English
Published:	2016
Subjects:	Markov processes Graph theory Computer science
Online Access:	https://doi.org/10.7916/D8JW8F0C

id	ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-D8JW8F0C
record_format	oai_dc
spelling	ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-D8JW8F0C2019-05-09T15:15:10ZLearning on Graphs with Partially Absorbing Random Walks: Theory and PracticeWu, Xiaoming2016ThesesMarkov processesGraph theoryComputer scienceLearning on graphs has been studied for decades with abundant models proposed, yet many of their behaviors and relations remain unclear. This thesis fills this gap by introducing a novel second-order Markov chain, called partially absorbing random walks (ParWalk). Different from ordinary random walk, ParWalk is absorbed at the current state $i$ with probability $p_i$, and follows a random edge out with probability $1-p_i$. The partial absorption results in absorption probability between any two vertices, which turns out to encompass various popular models including PageRank, hitting times, label propagation, and regularized Laplacian kernels. The unified treatment reveals the distinguishing characteristics of these models arising from different contexts, and allows comparing them and transferring findings from one paradigm to another. The key for learning on graphs is capitalizing on the cluster structure of the underlying graph. The absorption probabilities of ParWalk, turn out to be highly effective in capturing the cluster structure. Given a query vertex $q$ in a cluster $\mathcal{S}$, we show that when the absorbing capacity ($p_i$) of each vertex on the graph is small, the probabilities of ParWalk to be absorbed at $q$ have small variations in region of high conductance (within clusters), but have large gaps in region of low conductance (between clusters). And the less absorbent the vertices of $\mathcal{S}$ are, the better the absorption probabilities can represent the local cluster $\mathcal{S}$. Our theory induces principles for designing reliable similarity measures and provides justification to a number of popular ones such as hitting times and the pseudo-inverse of graph Laplacian. Furthermore, it reveals their new important properties. For example, we are the first to show that hitting times is better in retrieving sparse clusters, while the pseudo-inverse of graph Laplacian is better for dense ones. The theoretical insights instilled from ParWalk guide us in developing robust algorithms for various applications including local clustering, semi-supervised learning, and ranking. For local clustering, we propose a new method for salient object segmentation. By taking a noisy saliency map as the probability distribution of query vertices, we compute the absorption probabilities of ParWalk to the queries, producing a high-quality refined saliency map where the objects can be easily segmented. For semi-supervised learning, we propose a new algorithm for label propagation. The algorithm is justified by our theoretical analysis and guaranteed to be superior than many existing ones. For ranking, we design a new similarity measure using ParWalk, which combines the strengths of both hitting times and the pseudo-inverse of graph Laplacian. The hybrid similarity measure can well adapt to complex data of diverse density, thus performs superiorly overall. For all these learning tasks, our methods achieve substantial improvements over the state-of-the-art on extensive benchmark datasets.Englishhttps://doi.org/10.7916/D8JW8F0C
collection	NDLTD
language	English
sources	NDLTD
topic	Markov processes Graph theory Computer science
spellingShingle	Markov processes Graph theory Computer science Wu, Xiaoming Learning on Graphs with Partially Absorbing Random Walks: Theory and Practice
description	Learning on graphs has been studied for decades with abundant models proposed, yet many of their behaviors and relations remain unclear. This thesis fills this gap by introducing a novel second-order Markov chain, called partially absorbing random walks (ParWalk). Different from ordinary random walk, ParWalk is absorbed at the current state $i$ with probability $p_i$, and follows a random edge out with probability $1-p_i$. The partial absorption results in absorption probability between any two vertices, which turns out to encompass various popular models including PageRank, hitting times, label propagation, and regularized Laplacian kernels. The unified treatment reveals the distinguishing characteristics of these models arising from different contexts, and allows comparing them and transferring findings from one paradigm to another. The key for learning on graphs is capitalizing on the cluster structure of the underlying graph. The absorption probabilities of ParWalk, turn out to be highly effective in capturing the cluster structure. Given a query vertex $q$ in a cluster $\mathcal{S}$, we show that when the absorbing capacity ($p_i$) of each vertex on the graph is small, the probabilities of ParWalk to be absorbed at $q$ have small variations in region of high conductance (within clusters), but have large gaps in region of low conductance (between clusters). And the less absorbent the vertices of $\mathcal{S}$ are, the better the absorption probabilities can represent the local cluster $\mathcal{S}$. Our theory induces principles for designing reliable similarity measures and provides justification to a number of popular ones such as hitting times and the pseudo-inverse of graph Laplacian. Furthermore, it reveals their new important properties. For example, we are the first to show that hitting times is better in retrieving sparse clusters, while the pseudo-inverse of graph Laplacian is better for dense ones. The theoretical insights instilled from ParWalk guide us in developing robust algorithms for various applications including local clustering, semi-supervised learning, and ranking. For local clustering, we propose a new method for salient object segmentation. By taking a noisy saliency map as the probability distribution of query vertices, we compute the absorption probabilities of ParWalk to the queries, producing a high-quality refined saliency map where the objects can be easily segmented. For semi-supervised learning, we propose a new algorithm for label propagation. The algorithm is justified by our theoretical analysis and guaranteed to be superior than many existing ones. For ranking, we design a new similarity measure using ParWalk, which combines the strengths of both hitting times and the pseudo-inverse of graph Laplacian. The hybrid similarity measure can well adapt to complex data of diverse density, thus performs superiorly overall. For all these learning tasks, our methods achieve substantial improvements over the state-of-the-art on extensive benchmark datasets.
author	Wu, Xiaoming
author_facet	Wu, Xiaoming
author_sort	Wu, Xiaoming
title	Learning on Graphs with Partially Absorbing Random Walks: Theory and Practice
title_short	Learning on Graphs with Partially Absorbing Random Walks: Theory and Practice
title_full	Learning on Graphs with Partially Absorbing Random Walks: Theory and Practice
title_fullStr	Learning on Graphs with Partially Absorbing Random Walks: Theory and Practice
title_full_unstemmed	Learning on Graphs with Partially Absorbing Random Walks: Theory and Practice
title_sort	learning on graphs with partially absorbing random walks: theory and practice
publishDate	2016
url	https://doi.org/10.7916/D8JW8F0C
work_keys_str_mv	AT wuxiaoming learningongraphswithpartiallyabsorbingrandomwalkstheoryandpractice
_version_	1719046442916511744

Learning on Graphs with Partially Absorbing Random Walks: Theory and Practice

Similar Items