BENCHMARKING SMALL-DATASET STRUCTURE-ACTIVITY-RELATIONSHIP MODELS FOR PREDICTION OF WNT SIGNALING INHIBITION

Quantitative structure-activity relationship (QSAR) models based on machine learning algorithms are powerful tools to expedite drug discovery processes and therapeutics development. Given the cost in acquiring large-sized training datasets, it is useful to examine if QSAR analysis can reasonably pre...

Full description

Bibliographic Details
Main Author: Kokabi, Mahtab
Format: Others
Published: ScholarWorks@UMass Amherst 2021
Subjects:
Online Access:https://scholarworks.umass.edu/masters_theses_2/1139
https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=2153&context=masters_theses_2
id ndltd-UMASS-oai-scholarworks.umass.edu-masters_theses_2-2153
record_format oai_dc
spelling ndltd-UMASS-oai-scholarworks.umass.edu-masters_theses_2-21532021-10-28T05:22:18Z BENCHMARKING SMALL-DATASET STRUCTURE-ACTIVITY-RELATIONSHIP MODELS FOR PREDICTION OF WNT SIGNALING INHIBITION Kokabi, Mahtab Quantitative structure-activity relationship (QSAR) models based on machine learning algorithms are powerful tools to expedite drug discovery processes and therapeutics development. Given the cost in acquiring large-sized training datasets, it is useful to examine if QSAR analysis can reasonably predict drug activity with only a small-sized dataset (size < 100) and benchmark these small-dataset QSAR models in application-specific studies. To this end, here we present a systematic benchmarking study on small-dataset QSAR models built for prediction of effective Wnt signaling inhibitors, which are essential to therapeutics development in prevalent human diseases (e.g., cancer). Specifically, we examined a total of 72 two-dimensional (2D) QSAR models based on 4 best-performing algorithms, 6 commonly used molecular fingerprints, and 3 typical fingerprint lengths. We trained these models using a training dataset (56 compounds), benchmarked their performance on 4 figures-of-merit (FOMs), and examined their prediction accuracy using an external validation dataset (14 compounds). Our data show that the model performance is maximized when: 1) molecular fingerprints are selected to provide sufficient, unique, and not overly detailed representations of the chemical structures of drug compounds; 2) algorithms are selected to reduce the number of false predictions due to class imbalance in the dataset; and 3) models are selected to reach balanced performance on all 4 FOMs. These results may provide general guidelines in developing high-performance small-dataset QSAR models for drug activity prediction. 2021-10-20T17:58:56Z text application/pdf https://scholarworks.umass.edu/masters_theses_2/1139 https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=2153&amp;context=masters_theses_2 Masters Theses ScholarWorks@UMass Amherst Bioactivity prediction drug discovery machine learning molecular fingerprint quantitative structure-activity relationship Wnt signaling Biomedical Computational Engineering Computer Engineering Electrical and Computer Engineering
collection NDLTD
format Others
sources NDLTD
topic Bioactivity prediction
drug discovery
machine learning
molecular fingerprint
quantitative structure-activity relationship
Wnt signaling
Biomedical
Computational Engineering
Computer Engineering
Electrical and Computer Engineering
spellingShingle Bioactivity prediction
drug discovery
machine learning
molecular fingerprint
quantitative structure-activity relationship
Wnt signaling
Biomedical
Computational Engineering
Computer Engineering
Electrical and Computer Engineering
Kokabi, Mahtab
BENCHMARKING SMALL-DATASET STRUCTURE-ACTIVITY-RELATIONSHIP MODELS FOR PREDICTION OF WNT SIGNALING INHIBITION
description Quantitative structure-activity relationship (QSAR) models based on machine learning algorithms are powerful tools to expedite drug discovery processes and therapeutics development. Given the cost in acquiring large-sized training datasets, it is useful to examine if QSAR analysis can reasonably predict drug activity with only a small-sized dataset (size < 100) and benchmark these small-dataset QSAR models in application-specific studies. To this end, here we present a systematic benchmarking study on small-dataset QSAR models built for prediction of effective Wnt signaling inhibitors, which are essential to therapeutics development in prevalent human diseases (e.g., cancer). Specifically, we examined a total of 72 two-dimensional (2D) QSAR models based on 4 best-performing algorithms, 6 commonly used molecular fingerprints, and 3 typical fingerprint lengths. We trained these models using a training dataset (56 compounds), benchmarked their performance on 4 figures-of-merit (FOMs), and examined their prediction accuracy using an external validation dataset (14 compounds). Our data show that the model performance is maximized when: 1) molecular fingerprints are selected to provide sufficient, unique, and not overly detailed representations of the chemical structures of drug compounds; 2) algorithms are selected to reduce the number of false predictions due to class imbalance in the dataset; and 3) models are selected to reach balanced performance on all 4 FOMs. These results may provide general guidelines in developing high-performance small-dataset QSAR models for drug activity prediction.
author Kokabi, Mahtab
author_facet Kokabi, Mahtab
author_sort Kokabi, Mahtab
title BENCHMARKING SMALL-DATASET STRUCTURE-ACTIVITY-RELATIONSHIP MODELS FOR PREDICTION OF WNT SIGNALING INHIBITION
title_short BENCHMARKING SMALL-DATASET STRUCTURE-ACTIVITY-RELATIONSHIP MODELS FOR PREDICTION OF WNT SIGNALING INHIBITION
title_full BENCHMARKING SMALL-DATASET STRUCTURE-ACTIVITY-RELATIONSHIP MODELS FOR PREDICTION OF WNT SIGNALING INHIBITION
title_fullStr BENCHMARKING SMALL-DATASET STRUCTURE-ACTIVITY-RELATIONSHIP MODELS FOR PREDICTION OF WNT SIGNALING INHIBITION
title_full_unstemmed BENCHMARKING SMALL-DATASET STRUCTURE-ACTIVITY-RELATIONSHIP MODELS FOR PREDICTION OF WNT SIGNALING INHIBITION
title_sort benchmarking small-dataset structure-activity-relationship models for prediction of wnt signaling inhibition
publisher ScholarWorks@UMass Amherst
publishDate 2021
url https://scholarworks.umass.edu/masters_theses_2/1139
https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=2153&amp;context=masters_theses_2
work_keys_str_mv AT kokabimahtab benchmarkingsmalldatasetstructureactivityrelationshipmodelsforpredictionofwntsignalinginhibition
_version_ 1719491439856975872