Benchmarking Small-Dataset Structure-Activity-Relationship Models for Prediction of Wnt Signaling Inhibition

Quantitative structure-activity relationship (QSAR) models based on machine learning algorithms are powerful tools to expedite drug discovery processes and therapeutics development. Given the cost in acquiring large-sized training datasets, it is useful to examine if QSAR analysis can reasonably pre...

Full description

Bibliographic Details
Main Authors: Mahtab Kokabi, Matthew Donnelly, Guangyu Xu
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9301286/
id doaj-5417e5d6dfac408cb30b772aaad22254
record_format Article
spelling doaj-5417e5d6dfac408cb30b772aaad222542021-03-30T04:27:46ZengIEEEIEEE Access2169-35362020-01-01822883122884010.1109/ACCESS.2020.30461909301286Benchmarking Small-Dataset Structure-Activity-Relationship Models for Prediction of Wnt Signaling InhibitionMahtab Kokabi0https://orcid.org/0000-0001-5228-8358Matthew Donnelly1Guangyu Xu2https://orcid.org/0000-0003-1423-5399Department of Electrical and Computer Engineering, University of Massachusetts at Amherst, Amherst, MA, USADepartment of Electrical and Computer Engineering, University of Massachusetts at Amherst, Amherst, MA, USADepartment of Electrical and Computer Engineering, University of Massachusetts at Amherst, Amherst, MA, USAQuantitative structure-activity relationship (QSAR) models based on machine learning algorithms are powerful tools to expedite drug discovery processes and therapeutics development. Given the cost in acquiring large-sized training datasets, it is useful to examine if QSAR analysis can reasonably predict drug activity with only a small-sized dataset (size <; 100) and benchmark these small-dataset QSAR models in application-specific studies. To this end, here we present a systematic benchmarking study on small-dataset QSAR models built for prediction of effective Wnt signaling inhibitors, which are essential to therapeutics development in prevalent human diseases (e.g., cancer). Specifically, we examined a total of 72 two-dimensional (2D) QSAR models based on 4 best-performing algorithms, 6 commonly used molecular fingerprints, and 3 typical fingerprint lengths. We trained these models using a training dataset (56 compounds), benchmarked their performance on 4 figures-of-merit (FOMs), and examined their prediction accuracy using an external validation dataset (14 compounds). Our data show that the model performance is maximized when: 1) molecular fingerprints are selected to provide sufficient, unique, and not overly detailed representations of the chemical structures of drug compounds; 2) algorithms are selected to reduce the number of false predictions due to class imbalance in the dataset; and 3) models are selected to reach balanced performance on all 4 FOMs. These results may provide general guidelines in developing high-performance small-dataset QSAR models for drug activity prediction.https://ieeexplore.ieee.org/document/9301286/Bioactivity predictiondrug discoverymachine learningmolecular fingerprintquantitative structure-activity relationshipWnt signaling
collection DOAJ
language English
format Article
sources DOAJ
author Mahtab Kokabi
Matthew Donnelly
Guangyu Xu
spellingShingle Mahtab Kokabi
Matthew Donnelly
Guangyu Xu
Benchmarking Small-Dataset Structure-Activity-Relationship Models for Prediction of Wnt Signaling Inhibition
IEEE Access
Bioactivity prediction
drug discovery
machine learning
molecular fingerprint
quantitative structure-activity relationship
Wnt signaling
author_facet Mahtab Kokabi
Matthew Donnelly
Guangyu Xu
author_sort Mahtab Kokabi
title Benchmarking Small-Dataset Structure-Activity-Relationship Models for Prediction of Wnt Signaling Inhibition
title_short Benchmarking Small-Dataset Structure-Activity-Relationship Models for Prediction of Wnt Signaling Inhibition
title_full Benchmarking Small-Dataset Structure-Activity-Relationship Models for Prediction of Wnt Signaling Inhibition
title_fullStr Benchmarking Small-Dataset Structure-Activity-Relationship Models for Prediction of Wnt Signaling Inhibition
title_full_unstemmed Benchmarking Small-Dataset Structure-Activity-Relationship Models for Prediction of Wnt Signaling Inhibition
title_sort benchmarking small-dataset structure-activity-relationship models for prediction of wnt signaling inhibition
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description Quantitative structure-activity relationship (QSAR) models based on machine learning algorithms are powerful tools to expedite drug discovery processes and therapeutics development. Given the cost in acquiring large-sized training datasets, it is useful to examine if QSAR analysis can reasonably predict drug activity with only a small-sized dataset (size <; 100) and benchmark these small-dataset QSAR models in application-specific studies. To this end, here we present a systematic benchmarking study on small-dataset QSAR models built for prediction of effective Wnt signaling inhibitors, which are essential to therapeutics development in prevalent human diseases (e.g., cancer). Specifically, we examined a total of 72 two-dimensional (2D) QSAR models based on 4 best-performing algorithms, 6 commonly used molecular fingerprints, and 3 typical fingerprint lengths. We trained these models using a training dataset (56 compounds), benchmarked their performance on 4 figures-of-merit (FOMs), and examined their prediction accuracy using an external validation dataset (14 compounds). Our data show that the model performance is maximized when: 1) molecular fingerprints are selected to provide sufficient, unique, and not overly detailed representations of the chemical structures of drug compounds; 2) algorithms are selected to reduce the number of false predictions due to class imbalance in the dataset; and 3) models are selected to reach balanced performance on all 4 FOMs. These results may provide general guidelines in developing high-performance small-dataset QSAR models for drug activity prediction.
topic Bioactivity prediction
drug discovery
machine learning
molecular fingerprint
quantitative structure-activity relationship
Wnt signaling
url https://ieeexplore.ieee.org/document/9301286/
work_keys_str_mv AT mahtabkokabi benchmarkingsmalldatasetstructureactivityrelationshipmodelsforpredictionofwntsignalinginhibition
AT matthewdonnelly benchmarkingsmalldatasetstructureactivityrelationshipmodelsforpredictionofwntsignalinginhibition
AT guangyuxu benchmarkingsmalldatasetstructureactivityrelationshipmodelsforpredictionofwntsignalinginhibition
_version_ 1724181758273912832