Role of biases in neural network models
The capacity problem for multi-layer networks has proven especially elusive. Our calculation of the capacity of multi-layer networks built by constructive algorithms relies heavily on the existence of biases in the basic building block, the binary perceptron. It is the first time where the capacity...
Main Author: | |
---|---|
Published: |
University of Edinburgh
1997
|
Subjects: | |
Online Access: | http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.663657 |
id |
ndltd-bl.uk-oai-ethos.bl.uk-663657 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-bl.uk-oai-ethos.bl.uk-6636572015-12-03T03:33:37ZRole of biases in neural network modelsWest, Ansgar Heinrich Ludolf1997The capacity problem for multi-layer networks has proven especially elusive. Our calculation of the capacity of multi-layer networks built by constructive algorithms relies heavily on the existence of biases in the basic building block, the binary perceptron. It is the first time where the capacity is explicitly evaluated for large networks and finite stability. One finds that the constructive algorithms studied, a tiling-like algorithm and variants of the upstart algorithm, do not saturate the known Mitchison-Durbin bound. In supervised learning, a student network is presented with training examples in the form of input-output pairs, where the output is generated by a teacher network. The central question to be answered is the relation between the number of examples presented and the typical performance of the student in approximating the teacher rule, which is usually termed generalisation. The influence of biases in such a student-teacher scenario has been assessed for the two-layer soft-committee architecture, which is a universal approximator and already resembles applicable multi-layer network models, within the on-line learning paradigm, where training examples are presented serially. One finds that adjustable biases dramatically alter the learning behaviour. The suboptimal symmetric phase, which can easily dominate training for fixed biases, vanishes almost entirely for non-degenerate teacher biases. Furthermore, the extended model exhibits a much richer dynamical behaviour, exemplified especially by a multitude of (attractive) suboptimal fixed points even for realizable cases, causing the training to fail or be severely slowed down. In addition, in order to study possible improvements over gradient decent training, an adaptive back-propagation algorithm parameterised by a "temperature" is introduced, which enhances the ability of the student to distinguish between teacher nodes. This algorithm, which has been studied in the various learning stages, provides more effective symmetry breaking between hidden units and faster convergence to optimal generalisation.006.3University of Edinburghhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.663657http://hdl.handle.net/1842/11546Electronic Thesis or Dissertation |
collection |
NDLTD |
sources |
NDLTD |
topic |
006.3 |
spellingShingle |
006.3 West, Ansgar Heinrich Ludolf Role of biases in neural network models |
description |
The capacity problem for multi-layer networks has proven especially elusive. Our calculation of the capacity of multi-layer networks built by constructive algorithms relies heavily on the existence of biases in the basic building block, the binary perceptron. It is the first time where the capacity is explicitly evaluated for large networks and finite stability. One finds that the constructive algorithms studied, a tiling-like algorithm and variants of the upstart algorithm, do not saturate the known Mitchison-Durbin bound. In supervised learning, a student network is presented with training examples in the form of input-output pairs, where the output is generated by a teacher network. The central question to be answered is the relation between the number of examples presented and the typical performance of the student in approximating the teacher rule, which is usually termed generalisation. The influence of biases in such a student-teacher scenario has been assessed for the two-layer soft-committee architecture, which is a universal approximator and already resembles applicable multi-layer network models, within the on-line learning paradigm, where training examples are presented serially. One finds that adjustable biases dramatically alter the learning behaviour. The suboptimal symmetric phase, which can easily dominate training for fixed biases, vanishes almost entirely for non-degenerate teacher biases. Furthermore, the extended model exhibits a much richer dynamical behaviour, exemplified especially by a multitude of (attractive) suboptimal fixed points even for realizable cases, causing the training to fail or be severely slowed down. In addition, in order to study possible improvements over gradient decent training, an adaptive back-propagation algorithm parameterised by a "temperature" is introduced, which enhances the ability of the student to distinguish between teacher nodes. This algorithm, which has been studied in the various learning stages, provides more effective symmetry breaking between hidden units and faster convergence to optimal generalisation. |
author |
West, Ansgar Heinrich Ludolf |
author_facet |
West, Ansgar Heinrich Ludolf |
author_sort |
West, Ansgar Heinrich Ludolf |
title |
Role of biases in neural network models |
title_short |
Role of biases in neural network models |
title_full |
Role of biases in neural network models |
title_fullStr |
Role of biases in neural network models |
title_full_unstemmed |
Role of biases in neural network models |
title_sort |
role of biases in neural network models |
publisher |
University of Edinburgh |
publishDate |
1997 |
url |
http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.663657 |
work_keys_str_mv |
AT westansgarheinrichludolf roleofbiasesinneuralnetworkmodels |
_version_ |
1718142159780052992 |