USING MACHINE LEARNING TECHNIQUES FOR IDENTIFICATION AND IMPROVEMENT OF CHEMICAL DESCRIPTORS IN QSAR MODELING.

Bibliographic Details
Main Author: Ribeiro Leite SIlva, Joao Vinicius, Ribeiro
Language:English
Published: The Ohio State University / OhioLINK 2020
Subjects:
Online Access:http://rave.ohiolink.edu/etdc/view?acc_num=osu1565994502476339
id ndltd-OhioLink-oai-etd.ohiolink.edu-osu1565994502476339
record_format oai_dc
spelling ndltd-OhioLink-oai-etd.ohiolink.edu-osu15659945024763392021-08-03T07:12:25Z USING MACHINE LEARNING TECHNIQUES FOR IDENTIFICATION AND IMPROVEMENT OF CHEMICAL DESCRIPTORS IN QSAR MODELING. Ribeiro Leite SIlva, Joao Vinicius, Ribeiro Chemical Engineering Evaluating the behavior of a chemical inside a living organism is an essential step in the drug design process. Being able to predict properties related to chemical activity and toxicity can significantly improve the efficiency of developing effective and safe chemical products. The field of cheminformatics strives to use in silico approaches to elucidate such properties. Quantitative Structure-Activity Relationship (QSAR) modeling is a subfield of cheminformatics that leverages experimental data to derive empirical models for a particular chemical property or activity in terms of molecular structure. Creating a QSAR modeling involves successfully identifying structural features of the compounds present in a chemical dataset that can differentiate the compounds concerning the endpoint of interest. Frequently the number of structural features present in the data is enormous, which leads to models that are overly-complex and hard to interpret. We present a framework that can be used to automatically identify relevant chemical structures in a chemical dataset, which operates by the following steps: 1) Extract chemical substructures from the dataset. 2) Evaluate the discriminative power of each feature using the chi-square statistic, accuracy, and frequency, thereby filter all by the most relevant. 3) Apply hierarchical clustering to identify and remove redundant features.Another aspect of this work is the introduction of a descriptor/feature generation and consolidation technique described by the application of the logical union to binary features. This idea can be used to cluster structural features into a more general concept without losing the chemical information present at each variable. We make use of a genetic algorithm to generate unions, which allows for the creation of a new set of variables that were constructed from chemical structural features. This strategy has the benefits of reducing the dimensionality of the data while achieving a high model performance without a loss in model interpretability. We present three endpoints as case studies to test our proposed techniques: Brain-blood barrier permeability, Ames mutagenicity, and AIDS antiviral response. We identified descriptors that known to be mechanistically related to these properties. We also created QSAR models for blood-brain barrier permeability and Ames mutagenicity using the proposed algorithm to generate the variables that were used in the models. These QSAR models for blood-brain barrier permeability and Ames mutagenicity performed with a concordance of 75% and 76%, respectively. This is a similar performance to other learning methods applied to these data sets without loss of interpretability. The information from our work can help guide new experiments and in the design of new chemical products. 2020-02-24 English text The Ohio State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=osu1565994502476339 http://rave.ohiolink.edu/etdc/view?acc_num=osu1565994502476339 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws.
collection NDLTD
language English
sources NDLTD
topic Chemical Engineering
spellingShingle Chemical Engineering
Ribeiro Leite SIlva, Joao Vinicius, Ribeiro
USING MACHINE LEARNING TECHNIQUES FOR IDENTIFICATION AND IMPROVEMENT OF CHEMICAL DESCRIPTORS IN QSAR MODELING.
author Ribeiro Leite SIlva, Joao Vinicius, Ribeiro
author_facet Ribeiro Leite SIlva, Joao Vinicius, Ribeiro
author_sort Ribeiro Leite SIlva, Joao Vinicius, Ribeiro
title USING MACHINE LEARNING TECHNIQUES FOR IDENTIFICATION AND IMPROVEMENT OF CHEMICAL DESCRIPTORS IN QSAR MODELING.
title_short USING MACHINE LEARNING TECHNIQUES FOR IDENTIFICATION AND IMPROVEMENT OF CHEMICAL DESCRIPTORS IN QSAR MODELING.
title_full USING MACHINE LEARNING TECHNIQUES FOR IDENTIFICATION AND IMPROVEMENT OF CHEMICAL DESCRIPTORS IN QSAR MODELING.
title_fullStr USING MACHINE LEARNING TECHNIQUES FOR IDENTIFICATION AND IMPROVEMENT OF CHEMICAL DESCRIPTORS IN QSAR MODELING.
title_full_unstemmed USING MACHINE LEARNING TECHNIQUES FOR IDENTIFICATION AND IMPROVEMENT OF CHEMICAL DESCRIPTORS IN QSAR MODELING.
title_sort using machine learning techniques for identification and improvement of chemical descriptors in qsar modeling.
publisher The Ohio State University / OhioLINK
publishDate 2020
url http://rave.ohiolink.edu/etdc/view?acc_num=osu1565994502476339
work_keys_str_mv AT ribeiroleitesilvajoaoviniciusribeiro usingmachinelearningtechniquesforidentificationandimprovementofchemicaldescriptorsinqsarmodeling
_version_ 1719456354813345792