Extension of Similarity Functions and their Application toChemical Informatics Problems
Main Author: | |
---|---|
Language: | English |
Published: |
The Ohio State University / OhioLINK
2018
|
Subjects: | |
Online Access: | http://rave.ohiolink.edu/etdc/view?acc_num=osu1542299336598615 |
id |
ndltd-OhioLink-oai-etd.ohiolink.edu-osu1542299336598615 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-OhioLink-oai-etd.ohiolink.edu-osu15422993365986152021-08-03T07:08:53Z Extension of Similarity Functions and their Application toChemical Informatics Problems Wood, Nicholas Linder Information Science Mathematics Molecular Chemistry Pharmacy Sciences Statistics Theoretical Mathematics Chemical Informatics Chemoinformatics Cheminformatics QSAR Domain of Applicability Machine Learning Kernel Linear Algebra Mathematics Positive Definite Similarity is the most pervasive concept in chemoinformatics and it providesdirection for many of the problems which arise in that field. Similarity functionsare mathematical tools for quantifying the similarity of one molecule with respectto another molecule. In this work, we developed a method for the quantificationof the similarity of one molecule with respect to a set of molecules. This methodrequires a similarity function which is symmetric and positive definite. If thesimilarity function meets two additional mild requirements, namely if it is boundbetween zero and unity and is unity when evaluated on two identical molecules,then we say that the similarity function is extendable. In this case, the similarityof a molecule with respect to a set containing one molecule reduces to theoriginal similarity function evaluated on those two molecules. We additionallystated and proved several properties of the extension of similarity functions.We then applied the extension of similarity functions to two problems inchemoinformatics. First, we used the extension of similarity functions as thebasis for machine learning models for the prediction of various molecularendpoints. These machine learning models were compared to the kNN machinelearning model. For each endpoint predicted, the model based on the extensionof similarity functions was shown either comparable to or to be exceeding thekNN model. Second, we used the extension of similarity functions as the basisfor defining the domain of applicability of a machine learning model. We appliedthis definition to a kNN model and showed that using the extension of similarityfunctions can be used to order predictions for the rational selection of moleculesfor further testing. We showed how doing so can increase the overall usefulnessof a machine learning model.Finally, we stated several mathematical questions related to the extension ofsimilarity functions which, if answered, could aid in the training of machinelearning models based on the extension of similarity functions. 2018 English text The Ohio State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=osu1542299336598615 http://rave.ohiolink.edu/etdc/view?acc_num=osu1542299336598615 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws. |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
topic |
Information Science Mathematics Molecular Chemistry Pharmacy Sciences Statistics Theoretical Mathematics Chemical Informatics Chemoinformatics Cheminformatics QSAR Domain of Applicability Machine Learning Kernel Linear Algebra Mathematics Positive Definite |
spellingShingle |
Information Science Mathematics Molecular Chemistry Pharmacy Sciences Statistics Theoretical Mathematics Chemical Informatics Chemoinformatics Cheminformatics QSAR Domain of Applicability Machine Learning Kernel Linear Algebra Mathematics Positive Definite Wood, Nicholas Linder Extension of Similarity Functions and their Application toChemical Informatics Problems |
author |
Wood, Nicholas Linder |
author_facet |
Wood, Nicholas Linder |
author_sort |
Wood, Nicholas Linder |
title |
Extension of Similarity Functions and their Application toChemical Informatics Problems |
title_short |
Extension of Similarity Functions and their Application toChemical Informatics Problems |
title_full |
Extension of Similarity Functions and their Application toChemical Informatics Problems |
title_fullStr |
Extension of Similarity Functions and their Application toChemical Informatics Problems |
title_full_unstemmed |
Extension of Similarity Functions and their Application toChemical Informatics Problems |
title_sort |
extension of similarity functions and their application tochemical informatics problems |
publisher |
The Ohio State University / OhioLINK |
publishDate |
2018 |
url |
http://rave.ohiolink.edu/etdc/view?acc_num=osu1542299336598615 |
work_keys_str_mv |
AT woodnicholaslinder extensionofsimilarityfunctionsandtheirapplicationtochemicalinformaticsproblems |
_version_ |
1719454879482642432 |