Data mining and decision support in pharmaceutical databases

This thesis lies in the area of chemoinformatics, known as virtual screening (VS). VS describes a set of computational methods that provide a fast and cheap alternative to biological screening which involves the selection, synthesis and testing of molecules to ascertain their biological activity in...

Full description

Bibliographic Details
Main Author:	Pasupa, Kitsuchart
Published:	University of Sheffield 2007
Subjects:	610.21
Online Access:	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.487615

id	ndltd-bl.uk-oai-ethos.bl.uk-487615
record_format	oai_dc
spelling	ndltd-bl.uk-oai-ethos.bl.uk-4876152017-07-25T03:28:51ZData mining and decision support in pharmaceutical databasesPasupa, Kitsuchart2007This thesis lies in the area of chemoinformatics, known as virtual screening (VS). VS describes a set of computational methods that provide a fast and cheap alternative to biological screening which involves the selection, synthesis and testing of molecules to ascertain their biological activity in a particular domain, e.g. pain relief, reduction of inflammation. This is important because reducing the cost and, crucially, time in the early stages of compound development can have a disproportionate benefit in profitability in a cycle that has a short patent lifetime. Machine learning methods are becoming popular in this domain but problems arise when 2D fingerprints are used as descriptors. Fingerprints are an extremely sparse, binary-valued representation of molecules. Furthermore, VS also suffers strongly from the so-called "small-sample-size" problem where the number of covariates is comparable to or exceeds the number of samples. These problems can be solved by developing machine learning algorithm which can handle very large sets of high-dimensional data. The high-dimensional data contains an unprecedented level of complexity, hence, some forms of complexity control are therefore necessary. Alternatively a suitable dimensional reduction method can be used. This thesis consists of four major works which are conducted with the MDL Drug Data Report (MDDR) database. The works are as follows: (i) Development of binary kernel discrimination (BKD). (ii) A new algorithm is introduced for kernel machine family, the so-call "parsimonious kernel fisher discrimination". The proposed algorithm is then applied to VS tasks. (iii) Prediction by posterior estimation in VS. (iv) A comparison of four variants of principal component analysis with potential in VS. The experiments show that, BKD in conjunction with Jaccard/Tanimoto is found to be the best method while other approaches are found to be less accurate than BKD but still comparable in a number of cases.610.21University of Sheffieldhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.487615Electronic Thesis or Dissertation
collection	NDLTD
sources	NDLTD
topic	610.21
spellingShingle	610.21 Pasupa, Kitsuchart Data mining and decision support in pharmaceutical databases
description	This thesis lies in the area of chemoinformatics, known as virtual screening (VS). VS describes a set of computational methods that provide a fast and cheap alternative to biological screening which involves the selection, synthesis and testing of molecules to ascertain their biological activity in a particular domain, e.g. pain relief, reduction of inflammation. This is important because reducing the cost and, crucially, time in the early stages of compound development can have a disproportionate benefit in profitability in a cycle that has a short patent lifetime. Machine learning methods are becoming popular in this domain but problems arise when 2D fingerprints are used as descriptors. Fingerprints are an extremely sparse, binary-valued representation of molecules. Furthermore, VS also suffers strongly from the so-called "small-sample-size" problem where the number of covariates is comparable to or exceeds the number of samples. These problems can be solved by developing machine learning algorithm which can handle very large sets of high-dimensional data. The high-dimensional data contains an unprecedented level of complexity, hence, some forms of complexity control are therefore necessary. Alternatively a suitable dimensional reduction method can be used. This thesis consists of four major works which are conducted with the MDL Drug Data Report (MDDR) database. The works are as follows: (i) Development of binary kernel discrimination (BKD). (ii) A new algorithm is introduced for kernel machine family, the so-call "parsimonious kernel fisher discrimination". The proposed algorithm is then applied to VS tasks. (iii) Prediction by posterior estimation in VS. (iv) A comparison of four variants of principal component analysis with potential in VS. The experiments show that, BKD in conjunction with Jaccard/Tanimoto is found to be the best method while other approaches are found to be less accurate than BKD but still comparable in a number of cases.
author	Pasupa, Kitsuchart
author_facet	Pasupa, Kitsuchart
author_sort	Pasupa, Kitsuchart
title	Data mining and decision support in pharmaceutical databases
title_short	Data mining and decision support in pharmaceutical databases
title_full	Data mining and decision support in pharmaceutical databases
title_fullStr	Data mining and decision support in pharmaceutical databases
title_full_unstemmed	Data mining and decision support in pharmaceutical databases
title_sort	data mining and decision support in pharmaceutical databases
publisher	University of Sheffield
publishDate	2007
url	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.487615
work_keys_str_mv	AT pasupakitsuchart datamininganddecisionsupportinpharmaceuticaldatabases
_version_	1718504579274899456

Data mining and decision support in pharmaceutical databases

Similar Items