BENCHMARKING SMALL-DATASET STRUCTURE-ACTIVITY-RELATIONSHIP MODELS FOR PREDICTION OF WNT SIGNALING INHIBITION
Quantitative structure-activity relationship (QSAR) models based on machine learning algorithms are powerful tools to expedite drug discovery processes and therapeutics development. Given the cost in acquiring large-sized training datasets, it is useful to examine if QSAR analysis can reasonably pre...
Main Author: | |
---|---|
Format: | Others |
Published: |
ScholarWorks@UMass Amherst
2021
|
Subjects: | |
Online Access: | https://scholarworks.umass.edu/masters_theses_2/1139 https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=2153&context=masters_theses_2 |
id |
ndltd-UMASS-oai-scholarworks.umass.edu-masters_theses_2-2153 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UMASS-oai-scholarworks.umass.edu-masters_theses_2-21532021-10-28T05:22:18Z BENCHMARKING SMALL-DATASET STRUCTURE-ACTIVITY-RELATIONSHIP MODELS FOR PREDICTION OF WNT SIGNALING INHIBITION Kokabi, Mahtab Quantitative structure-activity relationship (QSAR) models based on machine learning algorithms are powerful tools to expedite drug discovery processes and therapeutics development. Given the cost in acquiring large-sized training datasets, it is useful to examine if QSAR analysis can reasonably predict drug activity with only a small-sized dataset (size < 100) and benchmark these small-dataset QSAR models in application-specific studies. To this end, here we present a systematic benchmarking study on small-dataset QSAR models built for prediction of effective Wnt signaling inhibitors, which are essential to therapeutics development in prevalent human diseases (e.g., cancer). Specifically, we examined a total of 72 two-dimensional (2D) QSAR models based on 4 best-performing algorithms, 6 commonly used molecular fingerprints, and 3 typical fingerprint lengths. We trained these models using a training dataset (56 compounds), benchmarked their performance on 4 figures-of-merit (FOMs), and examined their prediction accuracy using an external validation dataset (14 compounds). Our data show that the model performance is maximized when: 1) molecular fingerprints are selected to provide sufficient, unique, and not overly detailed representations of the chemical structures of drug compounds; 2) algorithms are selected to reduce the number of false predictions due to class imbalance in the dataset; and 3) models are selected to reach balanced performance on all 4 FOMs. These results may provide general guidelines in developing high-performance small-dataset QSAR models for drug activity prediction. 2021-10-20T17:58:56Z text application/pdf https://scholarworks.umass.edu/masters_theses_2/1139 https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=2153&context=masters_theses_2 Masters Theses ScholarWorks@UMass Amherst Bioactivity prediction drug discovery machine learning molecular fingerprint quantitative structure-activity relationship Wnt signaling Biomedical Computational Engineering Computer Engineering Electrical and Computer Engineering |
collection |
NDLTD |
format |
Others
|
sources |
NDLTD |
topic |
Bioactivity prediction drug discovery machine learning molecular fingerprint quantitative structure-activity relationship Wnt signaling Biomedical Computational Engineering Computer Engineering Electrical and Computer Engineering |
spellingShingle |
Bioactivity prediction drug discovery machine learning molecular fingerprint quantitative structure-activity relationship Wnt signaling Biomedical Computational Engineering Computer Engineering Electrical and Computer Engineering Kokabi, Mahtab BENCHMARKING SMALL-DATASET STRUCTURE-ACTIVITY-RELATIONSHIP MODELS FOR PREDICTION OF WNT SIGNALING INHIBITION |
description |
Quantitative structure-activity relationship (QSAR) models based on machine learning algorithms are powerful tools to expedite drug discovery processes and therapeutics development. Given the cost in acquiring large-sized training datasets, it is useful to examine if QSAR analysis can reasonably predict drug activity with only a small-sized dataset (size < 100) and benchmark these small-dataset QSAR models in application-specific studies. To this end, here we present a systematic benchmarking study on small-dataset QSAR models built for prediction of effective Wnt signaling inhibitors, which are essential to therapeutics development in prevalent human diseases (e.g., cancer). Specifically, we examined a total of 72 two-dimensional (2D) QSAR models based on 4 best-performing algorithms, 6 commonly used molecular fingerprints, and 3 typical fingerprint lengths. We trained these models using a training dataset (56 compounds), benchmarked their performance on 4 figures-of-merit (FOMs), and examined their prediction accuracy using an external validation dataset (14 compounds). Our data show that the model performance is maximized when: 1) molecular fingerprints are selected to provide sufficient, unique, and not overly detailed representations of the chemical structures of drug compounds; 2) algorithms are selected to reduce the number of false predictions due to class imbalance in the dataset; and 3) models are selected to reach balanced performance on all 4 FOMs. These results may provide general guidelines in developing high-performance small-dataset QSAR models for drug activity prediction. |
author |
Kokabi, Mahtab |
author_facet |
Kokabi, Mahtab |
author_sort |
Kokabi, Mahtab |
title |
BENCHMARKING SMALL-DATASET STRUCTURE-ACTIVITY-RELATIONSHIP MODELS FOR PREDICTION OF WNT SIGNALING INHIBITION |
title_short |
BENCHMARKING SMALL-DATASET STRUCTURE-ACTIVITY-RELATIONSHIP MODELS FOR PREDICTION OF WNT SIGNALING INHIBITION |
title_full |
BENCHMARKING SMALL-DATASET STRUCTURE-ACTIVITY-RELATIONSHIP MODELS FOR PREDICTION OF WNT SIGNALING INHIBITION |
title_fullStr |
BENCHMARKING SMALL-DATASET STRUCTURE-ACTIVITY-RELATIONSHIP MODELS FOR PREDICTION OF WNT SIGNALING INHIBITION |
title_full_unstemmed |
BENCHMARKING SMALL-DATASET STRUCTURE-ACTIVITY-RELATIONSHIP MODELS FOR PREDICTION OF WNT SIGNALING INHIBITION |
title_sort |
benchmarking small-dataset structure-activity-relationship models for prediction of wnt signaling inhibition |
publisher |
ScholarWorks@UMass Amherst |
publishDate |
2021 |
url |
https://scholarworks.umass.edu/masters_theses_2/1139 https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=2153&context=masters_theses_2 |
work_keys_str_mv |
AT kokabimahtab benchmarkingsmalldatasetstructureactivityrelationshipmodelsforpredictionofwntsignalinginhibition |
_version_ |
1719491439856975872 |