Role of biases in neural network models

The capacity problem for multi-layer networks has proven especially elusive. Our calculation of the capacity of multi-layer networks built by constructive algorithms relies heavily on the existence of biases in the basic building block, the binary perceptron. It is the first time where the capacity...

Full description

Bibliographic Details
Main Author:	West, Ansgar Heinrich Ludolf
Published:	University of Edinburgh 1997
Subjects:	006.3
Online Access:	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.663657

id	ndltd-bl.uk-oai-ethos.bl.uk-663657
record_format	oai_dc
spelling	ndltd-bl.uk-oai-ethos.bl.uk-6636572015-12-03T03:33:37ZRole of biases in neural network modelsWest, Ansgar Heinrich Ludolf1997The capacity problem for multi-layer networks has proven especially elusive. Our calculation of the capacity of multi-layer networks built by constructive algorithms relies heavily on the existence of biases in the basic building block, the binary perceptron. It is the first time where the capacity is explicitly evaluated for large networks and finite stability. One finds that the constructive algorithms studied, a tiling-like algorithm and variants of the upstart algorithm, do not saturate the known Mitchison-Durbin bound. In supervised learning, a student network is presented with training examples in the form of input-output pairs, where the output is generated by a teacher network. The central question to be answered is the relation between the number of examples presented and the typical performance of the student in approximating the teacher rule, which is usually termed generalisation. The influence of biases in such a student-teacher scenario has been assessed for the two-layer soft-committee architecture, which is a universal approximator and already resembles applicable multi-layer network models, within the on-line learning paradigm, where training examples are presented serially. One finds that adjustable biases dramatically alter the learning behaviour. The suboptimal symmetric phase, which can easily dominate training for fixed biases, vanishes almost entirely for non-degenerate teacher biases. Furthermore, the extended model exhibits a much richer dynamical behaviour, exemplified especially by a multitude of (attractive) suboptimal fixed points even for realizable cases, causing the training to fail or be severely slowed down. In addition, in order to study possible improvements over gradient decent training, an adaptive back-propagation algorithm parameterised by a "temperature" is introduced, which enhances the ability of the student to distinguish between teacher nodes. This algorithm, which has been studied in the various learning stages, provides more effective symmetry breaking between hidden units and faster convergence to optimal generalisation.006.3University of Edinburghhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.663657http://hdl.handle.net/1842/11546Electronic Thesis or Dissertation
collection	NDLTD
sources	NDLTD
topic	006.3
spellingShingle	006.3 West, Ansgar Heinrich Ludolf Role of biases in neural network models
description	The capacity problem for multi-layer networks has proven especially elusive. Our calculation of the capacity of multi-layer networks built by constructive algorithms relies heavily on the existence of biases in the basic building block, the binary perceptron. It is the first time where the capacity is explicitly evaluated for large networks and finite stability. One finds that the constructive algorithms studied, a tiling-like algorithm and variants of the upstart algorithm, do not saturate the known Mitchison-Durbin bound. In supervised learning, a student network is presented with training examples in the form of input-output pairs, where the output is generated by a teacher network. The central question to be answered is the relation between the number of examples presented and the typical performance of the student in approximating the teacher rule, which is usually termed generalisation. The influence of biases in such a student-teacher scenario has been assessed for the two-layer soft-committee architecture, which is a universal approximator and already resembles applicable multi-layer network models, within the on-line learning paradigm, where training examples are presented serially. One finds that adjustable biases dramatically alter the learning behaviour. The suboptimal symmetric phase, which can easily dominate training for fixed biases, vanishes almost entirely for non-degenerate teacher biases. Furthermore, the extended model exhibits a much richer dynamical behaviour, exemplified especially by a multitude of (attractive) suboptimal fixed points even for realizable cases, causing the training to fail or be severely slowed down. In addition, in order to study possible improvements over gradient decent training, an adaptive back-propagation algorithm parameterised by a "temperature" is introduced, which enhances the ability of the student to distinguish between teacher nodes. This algorithm, which has been studied in the various learning stages, provides more effective symmetry breaking between hidden units and faster convergence to optimal generalisation.
author	West, Ansgar Heinrich Ludolf
author_facet	West, Ansgar Heinrich Ludolf
author_sort	West, Ansgar Heinrich Ludolf
title	Role of biases in neural network models
title_short	Role of biases in neural network models
title_full	Role of biases in neural network models
title_fullStr	Role of biases in neural network models
title_full_unstemmed	Role of biases in neural network models
title_sort	role of biases in neural network models
publisher	University of Edinburgh
publishDate	1997
url	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.663657
work_keys_str_mv	AT westansgarheinrichludolf roleofbiasesinneuralnetworkmodels
_version_	1718142159780052992

Role of biases in neural network models

Similar Items