Learning curves for Gaussian process regression on random graphs

Gaussian processes are a non-parametric method that can be used to learn both regression and classification rules from examples for arbitrary input spaces using the ’kernel trick’. They are well understood for inputs from Euclidean spaces, however, much less research has focused on other spaces. In...

Full description

Bibliographic Details
Main Author:	Urry, Matthew
Published:	King's College London (University of London) 2013
Subjects:	519.2
Online Access:	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.631321

id	ndltd-bl.uk-oai-ethos.bl.uk-631321
record_format	oai_dc
spelling	ndltd-bl.uk-oai-ethos.bl.uk-6313212016-06-21T03:30:27ZLearning curves for Gaussian process regression on random graphsUrry, MatthewUrry, Matthew2013Gaussian processes are a non-parametric method that can be used to learn both regression and classification rules from examples for arbitrary input spaces using the ’kernel trick’. They are well understood for inputs from Euclidean spaces, however, much less research has focused on other spaces. In this thesis I aim to at least partially resolve this. In particular I focus on the case where inputs are defined on the vertices of a graph and the task is to learn a function defined on the vertices from noisy examples, i.e. a regression problem. A challenging problem in the area of non-parametric learning is to predict the general-isation error as a function of the number of examples or learning curve. I show that, unlike in the Euclidean case where predictions are either quantitatively accurate for a few specific cases or only qualitatively accurate for a broader range of situations, I am able to derive accurate learning curves for Gaussian processes on graphs for a wide range of input spaces given by ensembles of random graphs. I focus on the random walk kernel but my results generalise to any kernel that can be written as a truncated sum of powers of the normalised graph Laplacian. I begin first with a discussion of the properties of the random walk kernel, which can be viewed as an approximation of the ubiquitous squared exponential kernel in continuous spaces. I show that compared to the squared exponential kernel, the random walk kernel has some surprising properties which includes a non-trivial limiting form for some types of graphs. After investigating the limiting form of the kernel I then study its use as a prior. I propose a solution to this in the form of a local normalisation, where the prior scale at each vertex is normalised locally as desired. To drive home the point about kernel normalisation I then examine the differences between the two kernels when they are used as a Gaussian process prior over functions defined on the vertices of a graph. I show using numerical simulations that the locally normalised kernel leads to a probabilistically more plausible Gaussian process prior. After investigating the properties of the random walk kernel I then discuss the learning curves of a Gaussian process with a random walk kernel for both kernel normalisations in a matched scenario (where student and teacher are both Gaussian processes with matching hyperparameters). I show that by using the cavity method I can derive accu-rate predictions along the whole length of the learning curve that dramatically improves upon previously derived approximations for continuous spaces suitably extended to the discrete graph case. The derivation of the learning curve for the locally normalised kernel required an addi-tional approximation in the resulting cavity equations. I subsequently, therefore, investi-gate this approximation in more detail using the replica method. I show that the locally normalised kernel leads to a highly non-trivial replica calculation, that eventually shows that the approximation used in the cavity analysis amounts to ignoring some consistency requirements between incoming cavity distributions. I focus in particular on a teacher distribution that is given by a Gaussian process with a random walk kernel but different hyperparameters. I show that in this case, by applying the cavity method, I am able once more to calculate accurate predictions of the learning curve. The resulting equations resemble the matched case over an inflated number of variables. To finish this thesis I examine the learning curves for varying degrees of model mismatch.519.2King's College London (University of London)http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.631321https://kclpure.kcl.ac.uk/portal/en/theses/learning-curves-for-gaussian-process-regression-on-random-graphs(c1f5f395-0426-436c-989c-d0ade913423e).htmlElectronic Thesis or Dissertation
collection	NDLTD
sources	NDLTD
topic	519.2
spellingShingle	519.2 Urry, Matthew Learning curves for Gaussian process regression on random graphs
description	Gaussian processes are a non-parametric method that can be used to learn both regression and classification rules from examples for arbitrary input spaces using the ’kernel trick’. They are well understood for inputs from Euclidean spaces, however, much less research has focused on other spaces. In this thesis I aim to at least partially resolve this. In particular I focus on the case where inputs are defined on the vertices of a graph and the task is to learn a function defined on the vertices from noisy examples, i.e. a regression problem. A challenging problem in the area of non-parametric learning is to predict the general-isation error as a function of the number of examples or learning curve. I show that, unlike in the Euclidean case where predictions are either quantitatively accurate for a few specific cases or only qualitatively accurate for a broader range of situations, I am able to derive accurate learning curves for Gaussian processes on graphs for a wide range of input spaces given by ensembles of random graphs. I focus on the random walk kernel but my results generalise to any kernel that can be written as a truncated sum of powers of the normalised graph Laplacian. I begin first with a discussion of the properties of the random walk kernel, which can be viewed as an approximation of the ubiquitous squared exponential kernel in continuous spaces. I show that compared to the squared exponential kernel, the random walk kernel has some surprising properties which includes a non-trivial limiting form for some types of graphs. After investigating the limiting form of the kernel I then study its use as a prior. I propose a solution to this in the form of a local normalisation, where the prior scale at each vertex is normalised locally as desired. To drive home the point about kernel normalisation I then examine the differences between the two kernels when they are used as a Gaussian process prior over functions defined on the vertices of a graph. I show using numerical simulations that the locally normalised kernel leads to a probabilistically more plausible Gaussian process prior. After investigating the properties of the random walk kernel I then discuss the learning curves of a Gaussian process with a random walk kernel for both kernel normalisations in a matched scenario (where student and teacher are both Gaussian processes with matching hyperparameters). I show that by using the cavity method I can derive accu-rate predictions along the whole length of the learning curve that dramatically improves upon previously derived approximations for continuous spaces suitably extended to the discrete graph case. The derivation of the learning curve for the locally normalised kernel required an addi-tional approximation in the resulting cavity equations. I subsequently, therefore, investi-gate this approximation in more detail using the replica method. I show that the locally normalised kernel leads to a highly non-trivial replica calculation, that eventually shows that the approximation used in the cavity analysis amounts to ignoring some consistency requirements between incoming cavity distributions. I focus in particular on a teacher distribution that is given by a Gaussian process with a random walk kernel but different hyperparameters. I show that in this case, by applying the cavity method, I am able once more to calculate accurate predictions of the learning curve. The resulting equations resemble the matched case over an inflated number of variables. To finish this thesis I examine the learning curves for varying degrees of model mismatch.
author2	Urry, Matthew
author_facet	Urry, Matthew Urry, Matthew
author	Urry, Matthew
author_sort	Urry, Matthew
title	Learning curves for Gaussian process regression on random graphs
title_short	Learning curves for Gaussian process regression on random graphs
title_full	Learning curves for Gaussian process regression on random graphs
title_fullStr	Learning curves for Gaussian process regression on random graphs
title_full_unstemmed	Learning curves for Gaussian process regression on random graphs
title_sort	learning curves for gaussian process regression on random graphs
publisher	King's College London (University of London)
publishDate	2013
url	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.631321
work_keys_str_mv	AT urrymatthew learningcurvesforgaussianprocessregressiononrandomgraphs
_version_	1718313319776911360

Learning curves for Gaussian process regression on random graphs

Similar Items