Role of biases in neural network models

The capacity problem for multi-layer networks has proven especially elusive. Our calculation of the capacity of multi-layer networks built by constructive algorithms relies heavily on the existence of biases in the basic building block, the binary perceptron. It is the first time where the capacity...

Full description

Bibliographic Details
Main Author: West, Ansgar Heinrich Ludolf
Published: University of Edinburgh 1997
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.663657
id ndltd-bl.uk-oai-ethos.bl.uk-663657
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-6636572015-12-03T03:33:37ZRole of biases in neural network modelsWest, Ansgar Heinrich Ludolf1997The capacity problem for multi-layer networks has proven especially elusive. Our calculation of the capacity of multi-layer networks built by constructive algorithms relies heavily on the existence of biases in the basic building block, the binary perceptron. It is the first time where the capacity is explicitly evaluated for large networks and finite stability. One finds that the constructive algorithms studied, a tiling-like algorithm and variants of the upstart algorithm, do not saturate the known Mitchison-Durbin bound. In supervised learning, a student network is presented with training examples in the form of input-output pairs, where the output is generated by a teacher network. The central question to be answered is the relation between the number of examples presented and the typical performance of the student in approximating the teacher rule, which is usually termed generalisation. The influence of biases in such a student-teacher scenario has been assessed for the two-layer soft-committee architecture, which is a universal approximator and already resembles applicable multi-layer network models, within the on-line learning paradigm, where training examples are presented serially. One finds that adjustable biases dramatically alter the learning behaviour. The suboptimal symmetric phase, which can easily dominate training for fixed biases, vanishes almost entirely for non-degenerate teacher biases. Furthermore, the extended model exhibits a much richer dynamical behaviour, exemplified especially by a multitude of (attractive) suboptimal fixed points even for realizable cases, causing the training to fail or be severely slowed down. In addition, in order to study possible improvements over gradient decent training, an adaptive back-propagation algorithm parameterised by a "temperature" is introduced, which enhances the ability of the student to distinguish between teacher nodes. This algorithm, which has been studied in the various learning stages, provides more effective symmetry breaking between hidden units and faster convergence to optimal generalisation.006.3University of Edinburghhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.663657http://hdl.handle.net/1842/11546Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 006.3
spellingShingle 006.3
West, Ansgar Heinrich Ludolf
Role of biases in neural network models
description The capacity problem for multi-layer networks has proven especially elusive. Our calculation of the capacity of multi-layer networks built by constructive algorithms relies heavily on the existence of biases in the basic building block, the binary perceptron. It is the first time where the capacity is explicitly evaluated for large networks and finite stability. One finds that the constructive algorithms studied, a tiling-like algorithm and variants of the upstart algorithm, do not saturate the known Mitchison-Durbin bound. In supervised learning, a student network is presented with training examples in the form of input-output pairs, where the output is generated by a teacher network. The central question to be answered is the relation between the number of examples presented and the typical performance of the student in approximating the teacher rule, which is usually termed generalisation. The influence of biases in such a student-teacher scenario has been assessed for the two-layer soft-committee architecture, which is a universal approximator and already resembles applicable multi-layer network models, within the on-line learning paradigm, where training examples are presented serially. One finds that adjustable biases dramatically alter the learning behaviour. The suboptimal symmetric phase, which can easily dominate training for fixed biases, vanishes almost entirely for non-degenerate teacher biases. Furthermore, the extended model exhibits a much richer dynamical behaviour, exemplified especially by a multitude of (attractive) suboptimal fixed points even for realizable cases, causing the training to fail or be severely slowed down. In addition, in order to study possible improvements over gradient decent training, an adaptive back-propagation algorithm parameterised by a "temperature" is introduced, which enhances the ability of the student to distinguish between teacher nodes. This algorithm, which has been studied in the various learning stages, provides more effective symmetry breaking between hidden units and faster convergence to optimal generalisation.
author West, Ansgar Heinrich Ludolf
author_facet West, Ansgar Heinrich Ludolf
author_sort West, Ansgar Heinrich Ludolf
title Role of biases in neural network models
title_short Role of biases in neural network models
title_full Role of biases in neural network models
title_fullStr Role of biases in neural network models
title_full_unstemmed Role of biases in neural network models
title_sort role of biases in neural network models
publisher University of Edinburgh
publishDate 1997
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.663657
work_keys_str_mv AT westansgarheinrichludolf roleofbiasesinneuralnetworkmodels
_version_ 1718142159780052992