Novel Data Mining Methods for Virtual Screening of Biological Active Chemical Compounds

Drug discovery is a process that takes many years and hundreds of millions of dollars to reveal a confident conclusion about a specific treatment. Part of this sophisticated process is based on preliminary investigations to suggest a set of chemical compounds as candidate drugs for the treatment. Co...

Full description

Bibliographic Details
Main Author:	Soufan, Othman
Other Authors:	Bajic, Vladimir B.
Language:	en
Published:	2016
Subjects:	high-throughput screening Data Mining virtual screening Feature Selection multilabel learning
Online Access:	Soufan, O. (2016). Novel Data Mining Methods for Virtual Screening of Biological Active Chemical Compounds. KAUST Research Repository. https://doi.org/10.25781/KAUST-UY8Y6 http://hdl.handle.net/10754/621873

id	ndltd-kaust.edu.sa-oai-repository.kaust.edu.sa-10754-621873
record_format	oai_dc
spelling	ndltd-kaust.edu.sa-oai-repository.kaust.edu.sa-10754-6218732021-08-30T05:09:27Z Novel Data Mining Methods for Virtual Screening of Biological Active Chemical Compounds Soufan, Othman Bajic, Vladimir B. Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division Kalnis, Panos Arold, Stefan T. Gojobori, Takashi Schonbach, Christian high-throughput screening Data Mining virtual screening Feature Selection multilabel learning Drug discovery is a process that takes many years and hundreds of millions of dollars to reveal a confident conclusion about a specific treatment. Part of this sophisticated process is based on preliminary investigations to suggest a set of chemical compounds as candidate drugs for the treatment. Computational resources have been playing a significant role in this part through a step known as virtual screening. From a data mining perspective, availability of rich data resources is key in training prediction models. Yet, the difficulties imposed by big expansion in data and its dimensionality are inevitable. In this thesis, I address the main challenges that come when data mining techniques are used for virtual screening. In order to achieve an efficient virtual screening using data mining, I start by addressing the problem of feature selection and provide analysis of best ways to describe a chemical compound for an enhanced screening performance. High-throughput screening (HTS) assays data used for virtual screening are characterized by a great class imbalance. To handle this problem of class imbalance, I suggest using a novel algorithm called DRAMOTE to narrow down promising candidate chemicals aimed at interaction with specific molecular targets before they are experimentally evaluated. Existing works are mostly proposed for small-scale virtual screening based on making use of few thousands of interactions. Thus, I propose enabling large-scale (or big) virtual screening through learning millions of interaction while exploiting any relevant dependency for a better accuracy. A novel solution called DRABAL that incorporates structure learning of a Bayesian Network as a step to model dependency between the HTS assays, is showed to achieve significant improvements over existing state-of-the-art approaches. 2016-11-24T08:43:17Z 2017-11-23T00:00:00Z 2016-11-23 Dissertation Soufan, O. (2016). Novel Data Mining Methods for Virtual Screening of Biological Active Chemical Compounds. KAUST Research Repository. https://doi.org/10.25781/KAUST-UY8Y6 10.25781/KAUST-UY8Y6 http://hdl.handle.net/10754/621873 en 2017-11-23 At the time of archiving, the student author of this dissertation opted to temporarily restrict access to it. The full text of this dissertation became available to the public after the expiration of the embargo on 2017-11-23.
collection	NDLTD
language	en
sources	NDLTD
topic	high-throughput screening Data Mining virtual screening Feature Selection multilabel learning
spellingShingle	high-throughput screening Data Mining virtual screening Feature Selection multilabel learning Soufan, Othman Novel Data Mining Methods for Virtual Screening of Biological Active Chemical Compounds
description	Drug discovery is a process that takes many years and hundreds of millions of dollars to reveal a confident conclusion about a specific treatment. Part of this sophisticated process is based on preliminary investigations to suggest a set of chemical compounds as candidate drugs for the treatment. Computational resources have been playing a significant role in this part through a step known as virtual screening. From a data mining perspective, availability of rich data resources is key in training prediction models. Yet, the difficulties imposed by big expansion in data and its dimensionality are inevitable. In this thesis, I address the main challenges that come when data mining techniques are used for virtual screening. In order to achieve an efficient virtual screening using data mining, I start by addressing the problem of feature selection and provide analysis of best ways to describe a chemical compound for an enhanced screening performance. High-throughput screening (HTS) assays data used for virtual screening are characterized by a great class imbalance. To handle this problem of class imbalance, I suggest using a novel algorithm called DRAMOTE to narrow down promising candidate chemicals aimed at interaction with specific molecular targets before they are experimentally evaluated. Existing works are mostly proposed for small-scale virtual screening based on making use of few thousands of interactions. Thus, I propose enabling large-scale (or big) virtual screening through learning millions of interaction while exploiting any relevant dependency for a better accuracy. A novel solution called DRABAL that incorporates structure learning of a Bayesian Network as a step to model dependency between the HTS assays, is showed to achieve significant improvements over existing state-of-the-art approaches.
author2	Bajic, Vladimir B.
author_facet	Bajic, Vladimir B. Soufan, Othman
author	Soufan, Othman
author_sort	Soufan, Othman
title	Novel Data Mining Methods for Virtual Screening of Biological Active Chemical Compounds
title_short	Novel Data Mining Methods for Virtual Screening of Biological Active Chemical Compounds
title_full	Novel Data Mining Methods for Virtual Screening of Biological Active Chemical Compounds
title_fullStr	Novel Data Mining Methods for Virtual Screening of Biological Active Chemical Compounds
title_full_unstemmed	Novel Data Mining Methods for Virtual Screening of Biological Active Chemical Compounds
title_sort	novel data mining methods for virtual screening of biological active chemical compounds
publishDate	2016
url	Soufan, O. (2016). Novel Data Mining Methods for Virtual Screening of Biological Active Chemical Compounds. KAUST Research Repository. https://doi.org/10.25781/KAUST-UY8Y6 http://hdl.handle.net/10754/621873
work_keys_str_mv	AT soufanothman noveldataminingmethodsforvirtualscreeningofbiologicalactivechemicalcompounds
_version_	1719472715979554816

Novel Data Mining Methods for Virtual Screening of Biological Active Chemical Compounds

Similar Items