A sequence-length sensitive approach to learning biological grammars using inductive logic programming

This thesis aims to investigate if the ideas behind compression principles, such as the Minimum Description Length, can help us to improve the process of learning biological grammars from protein sequences using Inductive Logic Programming (ILP). Contrary to most traditional ILP learning problems, b...

Full description

Bibliographic Details
Main Author: Mamer, Thierry
Other Authors: McCall, John ; Bryant, Chris
Published: Robert Gordon University 2011
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.535812
id ndltd-bl.uk-oai-ethos.bl.uk-535812
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-5358122015-03-20T03:59:03ZA sequence-length sensitive approach to learning biological grammars using inductive logic programmingMamer, ThierryMcCall, John ; Bryant, Chris2011This thesis aims to investigate if the ideas behind compression principles, such as the Minimum Description Length, can help us to improve the process of learning biological grammars from protein sequences using Inductive Logic Programming (ILP). Contrary to most traditional ILP learning problems, biological sequences often have a high variation in their length. This variation in length is an important feature of biological sequences which should not be ignored by ILP systems. However we have identified that some ILP systems do not take into account the length of examples when evaluating their proposed hypotheses. During the learning process, many ILP systems use clause evaluation functions to assign a score to induced hypotheses, estimating their quality and effectively influencing the search. Traditionally, clause evaluation functions do not take into account the length of the examples which are covered by the clause. We propose L-modification, a way of modifying existing clause evaluation functions so that they take into account the length of the examples which they learn from. An empirical study was undertaken to investigate if significant improvements can be achieved by applying L-modification to a standard clause evaluation function. Furthermore, we generally investigated how ILP systems cope with the length of examples in training data. We show that our L-modified clause evaluation function outperforms our benchmark function in every experiment we conducted and thus we prove that L-modification is a useful concept. We also show that the length of the examples in the training data used by ILP systems does have an undeniable impact on the results.572.072Robert Gordon Universityhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.535812http://hdl.handle.net/10059/662Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 572.072
spellingShingle 572.072
Mamer, Thierry
A sequence-length sensitive approach to learning biological grammars using inductive logic programming
description This thesis aims to investigate if the ideas behind compression principles, such as the Minimum Description Length, can help us to improve the process of learning biological grammars from protein sequences using Inductive Logic Programming (ILP). Contrary to most traditional ILP learning problems, biological sequences often have a high variation in their length. This variation in length is an important feature of biological sequences which should not be ignored by ILP systems. However we have identified that some ILP systems do not take into account the length of examples when evaluating their proposed hypotheses. During the learning process, many ILP systems use clause evaluation functions to assign a score to induced hypotheses, estimating their quality and effectively influencing the search. Traditionally, clause evaluation functions do not take into account the length of the examples which are covered by the clause. We propose L-modification, a way of modifying existing clause evaluation functions so that they take into account the length of the examples which they learn from. An empirical study was undertaken to investigate if significant improvements can be achieved by applying L-modification to a standard clause evaluation function. Furthermore, we generally investigated how ILP systems cope with the length of examples in training data. We show that our L-modified clause evaluation function outperforms our benchmark function in every experiment we conducted and thus we prove that L-modification is a useful concept. We also show that the length of the examples in the training data used by ILP systems does have an undeniable impact on the results.
author2 McCall, John ; Bryant, Chris
author_facet McCall, John ; Bryant, Chris
Mamer, Thierry
author Mamer, Thierry
author_sort Mamer, Thierry
title A sequence-length sensitive approach to learning biological grammars using inductive logic programming
title_short A sequence-length sensitive approach to learning biological grammars using inductive logic programming
title_full A sequence-length sensitive approach to learning biological grammars using inductive logic programming
title_fullStr A sequence-length sensitive approach to learning biological grammars using inductive logic programming
title_full_unstemmed A sequence-length sensitive approach to learning biological grammars using inductive logic programming
title_sort sequence-length sensitive approach to learning biological grammars using inductive logic programming
publisher Robert Gordon University
publishDate 2011
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.535812
work_keys_str_mv AT mamerthierry asequencelengthsensitiveapproachtolearningbiologicalgrammarsusinginductivelogicprogramming
AT mamerthierry sequencelengthsensitiveapproachtolearningbiologicalgrammarsusinginductivelogicprogramming
_version_ 1716783344058892288