Data Mining/Machine Learning Techniques for Drug Discovery: Computational and Experimental Pipeline Development

Bibliographic Details
Main Author:	Chen, Jonathan Jun Feng
Language:	English
Published:	University of Akron / OhioLINK 2018
Subjects:	Bioinformatics Biology Biochemistry Chemical Engineering Computer Science vHTS virtual high-throughput screening bioinformatics biological informatics machine-learning QSAR quantitative structure-activity relationship cheminformatics chemical chemoinformatics data-mining pipeline PubChem drug discovery
Online Access:	http://rave.ohiolink.edu/etdc/view?acc_num=akron1524661027035591

id	ndltd-OhioLink-oai-etd.ohiolink.edu-akron1524661027035591
record_format	oai_dc
spelling	ndltd-OhioLink-oai-etd.ohiolink.edu-akron15246610270355912021-08-03T07:06:26Z Data Mining/Machine Learning Techniques for Drug Discovery: Computational and Experimental Pipeline Development Chen, Jonathan Jun Feng Bioinformatics Biology Biochemistry Chemical Engineering Computer Science vHTS virtual high-throughput screening bioinformatics biological informatics machine-learning QSAR quantitative structure-activity relationship cheminformatics chemical chemoinformatics data-mining pipeline PubChem drug discovery Medicine is a precious commodity that saves, prolongs, or increases the quality of life. However, medicinal active ingredient discovery is challenging and is one of the major bottlenecks to developing new pharmaceuticals. Progressive development of new therapeutic targets and compounds exacerbates the problem as the scale of the drug discovery endeavor increases to an unmanageable size. For example, the National Institute of Health houses the National Library of Medicine, which contains an ever-growing archive of genes, proteins, and therapeutic targets as well as candidate compounds. Manual inspection of all compounds and biological targets cannot match the rate in which new information is created and deposited. New methods of data processing and drug candidate consideration are needed.The work presented used and processed data from the NLM to identify new candidates for consideration. The drug discovery pipeline central to this work created models from existing compound-target interaction data that correlated structure to activity. The models were used to identify next candidates to test. Compound structural information was captured using the Signature molecular descriptor while models were created using principal component analysis, genetic algorithm, and support vector machines. The models identify new candidates for activity validation experiments in a virtual high-throughput screen of the 72 million compounds in PubChem Compound database of the National Library of Medicine. The models were retrained to determine if improvement was possible and what might affect improvement resulting from retraining. After activity validation experiments, the activity and structure of candidates and compounds from the training set were compared to identify structure-activity relationships for additional avenues of inquiry.Seven different case studies were conducted to test the robustness of the pipeline in response to changing dataset size and active fraction: Cathepsin L, Factor XIIa, Factor XIa, C1s, SENP8, and PK-M2 with two different datasets. The information from all seven case studies found model retraining was beneficial and the pipeline was more effective at low active fractions. Recommendations for future use include retraining models when possible, to extrapolate incrementally, and to apply to small active fractions datasets but avoid large high active fractions datasets to maximize pipeline effectiveness and utility. 2018-05-23 English text University of Akron / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=akron1524661027035591 http://rave.ohiolink.edu/etdc/view?acc_num=akron1524661027035591 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws.
collection	NDLTD
language	English
sources	NDLTD
topic	Bioinformatics Biology Biochemistry Chemical Engineering Computer Science vHTS virtual high-throughput screening bioinformatics biological informatics machine-learning QSAR quantitative structure-activity relationship cheminformatics chemical chemoinformatics data-mining pipeline PubChem drug discovery
spellingShingle	Bioinformatics Biology Biochemistry Chemical Engineering Computer Science vHTS virtual high-throughput screening bioinformatics biological informatics machine-learning QSAR quantitative structure-activity relationship cheminformatics chemical chemoinformatics data-mining pipeline PubChem drug discovery Chen, Jonathan Jun Feng Data Mining/Machine Learning Techniques for Drug Discovery: Computational and Experimental Pipeline Development
author	Chen, Jonathan Jun Feng
author_facet	Chen, Jonathan Jun Feng
author_sort	Chen, Jonathan Jun Feng
title	Data Mining/Machine Learning Techniques for Drug Discovery: Computational and Experimental Pipeline Development
title_short	Data Mining/Machine Learning Techniques for Drug Discovery: Computational and Experimental Pipeline Development
title_full	Data Mining/Machine Learning Techniques for Drug Discovery: Computational and Experimental Pipeline Development
title_fullStr	Data Mining/Machine Learning Techniques for Drug Discovery: Computational and Experimental Pipeline Development
title_full_unstemmed	Data Mining/Machine Learning Techniques for Drug Discovery: Computational and Experimental Pipeline Development
title_sort	data mining/machine learning techniques for drug discovery: computational and experimental pipeline development
publisher	University of Akron / OhioLINK
publishDate	2018
url	http://rave.ohiolink.edu/etdc/view?acc_num=akron1524661027035591
work_keys_str_mv	AT chenjonathanjunfeng dataminingmachinelearningtechniquesfordrugdiscoverycomputationalandexperimentalpipelinedevelopment
_version_	1719453524417314816

Data Mining/Machine Learning Techniques for Drug Discovery: Computational and Experimental Pipeline Development

Similar Items