Data mining and decision support in pharmaceutical databases

This thesis lies in the area of chemoinformatics, known as virtual screening (VS). VS describes a set of computational methods that provide a fast and cheap alternative to biological screening which involves the selection, synthesis and testing of molecules to ascertain their biological activity in...

Full description

Bibliographic Details
Main Author: Pasupa, Kitsuchart
Published: University of Sheffield 2007
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.487615
id ndltd-bl.uk-oai-ethos.bl.uk-487615
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-4876152017-07-25T03:28:51ZData mining and decision support in pharmaceutical databasesPasupa, Kitsuchart2007This thesis lies in the area of chemoinformatics, known as virtual screening (VS). VS describes a set of computational methods that provide a fast and cheap alternative to biological screening which involves the selection, synthesis and testing of molecules to ascertain their biological activity in a particular domain, e.g. pain relief, reduction of inflammation. This is important because reducing the cost and, crucially, time in the early stages of compound development can have a disproportionate benefit in profitability in a cycle that has a short patent lifetime. Machine learning methods are becoming popular in this domain but problems arise when 2D fingerprints are used as descriptors. Fingerprints are an extremely sparse, binary-valued representation of molecules. Furthermore, VS also suffers strongly from the so-called "small-sample-size" problem where the number of covariates is comparable to or exceeds the number of samples. These problems can be solved by developing machine learning algorithm which can handle very large sets of high-dimensional data. The high-dimensional data contains an unprecedented level of complexity, hence, some forms of complexity control are therefore necessary. Alternatively a suitable dimensional reduction method can be used. This thesis consists of four major works which are conducted with the MDL Drug Data Report (MDDR) database. The works are as follows: (i) Development of binary kernel discrimination (BKD). (ii) A new algorithm is introduced for kernel machine family, the so-call "parsimonious kernel fisher discrimination". The proposed algorithm is then applied to VS tasks. (iii) Prediction by posterior estimation in VS. (iv) A comparison of four variants of principal component analysis with potential in VS. The experiments show that, BKD in conjunction with Jaccard/Tanimoto is found to be the best method while other approaches are found to be less accurate than BKD but still comparable in a number of cases.610.21University of Sheffieldhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.487615Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 610.21
spellingShingle 610.21
Pasupa, Kitsuchart
Data mining and decision support in pharmaceutical databases
description This thesis lies in the area of chemoinformatics, known as virtual screening (VS). VS describes a set of computational methods that provide a fast and cheap alternative to biological screening which involves the selection, synthesis and testing of molecules to ascertain their biological activity in a particular domain, e.g. pain relief, reduction of inflammation. This is important because reducing the cost and, crucially, time in the early stages of compound development can have a disproportionate benefit in profitability in a cycle that has a short patent lifetime. Machine learning methods are becoming popular in this domain but problems arise when 2D fingerprints are used as descriptors. Fingerprints are an extremely sparse, binary-valued representation of molecules. Furthermore, VS also suffers strongly from the so-called "small-sample-size" problem where the number of covariates is comparable to or exceeds the number of samples. These problems can be solved by developing machine learning algorithm which can handle very large sets of high-dimensional data. The high-dimensional data contains an unprecedented level of complexity, hence, some forms of complexity control are therefore necessary. Alternatively a suitable dimensional reduction method can be used. This thesis consists of four major works which are conducted with the MDL Drug Data Report (MDDR) database. The works are as follows: (i) Development of binary kernel discrimination (BKD). (ii) A new algorithm is introduced for kernel machine family, the so-call "parsimonious kernel fisher discrimination". The proposed algorithm is then applied to VS tasks. (iii) Prediction by posterior estimation in VS. (iv) A comparison of four variants of principal component analysis with potential in VS. The experiments show that, BKD in conjunction with Jaccard/Tanimoto is found to be the best method while other approaches are found to be less accurate than BKD but still comparable in a number of cases.
author Pasupa, Kitsuchart
author_facet Pasupa, Kitsuchart
author_sort Pasupa, Kitsuchart
title Data mining and decision support in pharmaceutical databases
title_short Data mining and decision support in pharmaceutical databases
title_full Data mining and decision support in pharmaceutical databases
title_fullStr Data mining and decision support in pharmaceutical databases
title_full_unstemmed Data mining and decision support in pharmaceutical databases
title_sort data mining and decision support in pharmaceutical databases
publisher University of Sheffield
publishDate 2007
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.487615
work_keys_str_mv AT pasupakitsuchart datamininganddecisionsupportinpharmaceuticaldatabases
_version_ 1718504579274899456