Learning curves for Gaussian process regression on random graphs

Gaussian processes are a non-parametric method that can be used to learn both regression and classification rules from examples for arbitrary input spaces using the ’kernel trick’. They are well understood for inputs from Euclidean spaces, however, much less research has focused on other spaces. In...

Full description

Bibliographic Details
Main Author: Urry, Matthew
Published: King's College London (University of London) 2013
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.631321
id ndltd-bl.uk-oai-ethos.bl.uk-631321
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-6313212016-06-21T03:30:27ZLearning curves for Gaussian process regression on random graphsUrry, MatthewUrry, Matthew2013Gaussian processes are a non-parametric method that can be used to learn both regression and classification rules from examples for arbitrary input spaces using the ’kernel trick’. They are well understood for inputs from Euclidean spaces, however, much less research has focused on other spaces. In this thesis I aim to at least partially resolve this. In particular I focus on the case where inputs are defined on the vertices of a graph and the task is to learn a function defined on the vertices from noisy examples, i.e. a regression problem. A challenging problem in the area of non-parametric learning is to predict the general-isation error as a function of the number of examples or learning curve. I show that, unlike in the Euclidean case where predictions are either quantitatively accurate for a few specific cases or only qualitatively accurate for a broader range of situations, I am able to derive accurate learning curves for Gaussian processes on graphs for a wide range of input spaces given by ensembles of random graphs. I focus on the random walk kernel but my results generalise to any kernel that can be written as a truncated sum of powers of the normalised graph Laplacian. I begin first with a discussion of the properties of the random walk kernel, which can be viewed as an approximation of the ubiquitous squared exponential kernel in continuous spaces. I show that compared to the squared exponential kernel, the random walk kernel has some surprising properties which includes a non-trivial limiting form for some types of graphs. After investigating the limiting form of the kernel I then study its use as a prior. I propose a solution to this in the form of a local normalisation, where the prior scale at each vertex is normalised locally as desired. To drive home the point about kernel normalisation I then examine the differences between the two kernels when they are used as a Gaussian process prior over functions defined on the vertices of a graph. I show using numerical simulations that the locally normalised kernel leads to a probabilistically more plausible Gaussian process prior. After investigating the properties of the random walk kernel I then discuss the learning curves of a Gaussian process with a random walk kernel for both kernel normalisations in a matched scenario (where student and teacher are both Gaussian processes with matching hyperparameters). I show that by using the cavity method I can derive accu-rate predictions along the whole length of the learning curve that dramatically improves upon previously derived approximations for continuous spaces suitably extended to the discrete graph case. The derivation of the learning curve for the locally normalised kernel required an addi-tional approximation in the resulting cavity equations. I subsequently, therefore, investi-gate this approximation in more detail using the replica method. I show that the locally normalised kernel leads to a highly non-trivial replica calculation, that eventually shows that the approximation used in the cavity analysis amounts to ignoring some consistency requirements between incoming cavity distributions. I focus in particular on a teacher distribution that is given by a Gaussian process with a random walk kernel but different hyperparameters. I show that in this case, by applying the cavity method, I am able once more to calculate accurate predictions of the learning curve. The resulting equations resemble the matched case over an inflated number of variables. To finish this thesis I examine the learning curves for varying degrees of model mismatch.519.2King's College London (University of London)http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.631321https://kclpure.kcl.ac.uk/portal/en/theses/learning-curves-for-gaussian-process-regression-on-random-graphs(c1f5f395-0426-436c-989c-d0ade913423e).htmlElectronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 519.2
spellingShingle 519.2
Urry, Matthew
Learning curves for Gaussian process regression on random graphs
description Gaussian processes are a non-parametric method that can be used to learn both regression and classification rules from examples for arbitrary input spaces using the ’kernel trick’. They are well understood for inputs from Euclidean spaces, however, much less research has focused on other spaces. In this thesis I aim to at least partially resolve this. In particular I focus on the case where inputs are defined on the vertices of a graph and the task is to learn a function defined on the vertices from noisy examples, i.e. a regression problem. A challenging problem in the area of non-parametric learning is to predict the general-isation error as a function of the number of examples or learning curve. I show that, unlike in the Euclidean case where predictions are either quantitatively accurate for a few specific cases or only qualitatively accurate for a broader range of situations, I am able to derive accurate learning curves for Gaussian processes on graphs for a wide range of input spaces given by ensembles of random graphs. I focus on the random walk kernel but my results generalise to any kernel that can be written as a truncated sum of powers of the normalised graph Laplacian. I begin first with a discussion of the properties of the random walk kernel, which can be viewed as an approximation of the ubiquitous squared exponential kernel in continuous spaces. I show that compared to the squared exponential kernel, the random walk kernel has some surprising properties which includes a non-trivial limiting form for some types of graphs. After investigating the limiting form of the kernel I then study its use as a prior. I propose a solution to this in the form of a local normalisation, where the prior scale at each vertex is normalised locally as desired. To drive home the point about kernel normalisation I then examine the differences between the two kernels when they are used as a Gaussian process prior over functions defined on the vertices of a graph. I show using numerical simulations that the locally normalised kernel leads to a probabilistically more plausible Gaussian process prior. After investigating the properties of the random walk kernel I then discuss the learning curves of a Gaussian process with a random walk kernel for both kernel normalisations in a matched scenario (where student and teacher are both Gaussian processes with matching hyperparameters). I show that by using the cavity method I can derive accu-rate predictions along the whole length of the learning curve that dramatically improves upon previously derived approximations for continuous spaces suitably extended to the discrete graph case. The derivation of the learning curve for the locally normalised kernel required an addi-tional approximation in the resulting cavity equations. I subsequently, therefore, investi-gate this approximation in more detail using the replica method. I show that the locally normalised kernel leads to a highly non-trivial replica calculation, that eventually shows that the approximation used in the cavity analysis amounts to ignoring some consistency requirements between incoming cavity distributions. I focus in particular on a teacher distribution that is given by a Gaussian process with a random walk kernel but different hyperparameters. I show that in this case, by applying the cavity method, I am able once more to calculate accurate predictions of the learning curve. The resulting equations resemble the matched case over an inflated number of variables. To finish this thesis I examine the learning curves for varying degrees of model mismatch.
author2 Urry, Matthew
author_facet Urry, Matthew
Urry, Matthew
author Urry, Matthew
author_sort Urry, Matthew
title Learning curves for Gaussian process regression on random graphs
title_short Learning curves for Gaussian process regression on random graphs
title_full Learning curves for Gaussian process regression on random graphs
title_fullStr Learning curves for Gaussian process regression on random graphs
title_full_unstemmed Learning curves for Gaussian process regression on random graphs
title_sort learning curves for gaussian process regression on random graphs
publisher King's College London (University of London)
publishDate 2013
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.631321
work_keys_str_mv AT urrymatthew learningcurvesforgaussianprocessregressiononrandomgraphs
_version_ 1718313319776911360