Development of Artificial Intelligence-based In-Silico Toxicity Models. Data Quality Analysis and Model Performance Enhancement through Data Generation.

Toxic compounds, such as pesticides, are routinely tested against a range of aquatic, avian and mammalian species as part of the registration process. The need for reducing dependence on animal testing has led to an increasing interest in alternative methods such as in silico modelling. The QSAR...

Full description

Bibliographic Details
Main Author: Malazizi, Ladan
Other Authors: Neagu, Daniel
Language:en
Published: University of Bradford 2010
Subjects:
Online Access:http://hdl.handle.net/10454/4262
id ndltd-BRADFORD-oai-bradscholars.brad.ac.uk-10454-4262
record_format oai_dc
spelling ndltd-BRADFORD-oai-bradscholars.brad.ac.uk-10454-42622019-09-24T03:02:00Z Development of Artificial Intelligence-based In-Silico Toxicity Models. Data Quality Analysis and Model Performance Enhancement through Data Generation. Malazizi, Ladan Neagu, Daniel Graves-Morris, Peter R. Predictive toxicology Toxicity data Pesticides Artificial intelligence Data quality Data generation Model performance QSAR Classification algorithm Clustering Imbalanced dataset Endpoints Toxic compounds, such as pesticides, are routinely tested against a range of aquatic, avian and mammalian species as part of the registration process. The need for reducing dependence on animal testing has led to an increasing interest in alternative methods such as in silico modelling. The QSAR (Quantitative Structure Activity Relationship)-based models are already in use for predicting physicochemical properties, environmental fate, eco-toxicological effects, and specific biological endpoints for a wide range of chemicals. Data plays an important role in modelling QSARs and also in result analysis for toxicity testing processes. This research addresses number of issues in predictive toxicology. One issue is the problem of data quality. Although large amount of toxicity data is available from online sources, this data may contain some unreliable samples and may be defined as of low quality. Its presentation also might not be consistent throughout different sources and that makes the access, interpretation and comparison of the information difficult. To address this issue we started with detailed investigation and experimental work on DEMETRA data. The DEMETRA datasets have been produced by the EC-funded project DEMETRA. Based on the investigation, experiments and the results obtained, the author identified a number of data quality criteria in order to provide a solution for data evaluation in toxicology domain. An algorithm has also been proposed to assess data quality before modelling. Another issue considered in the thesis was the missing values in datasets for toxicology domain. Least Square Method for a paired dataset and Serial Correlation for single version dataset provided the solution for the problem in two different situations. A procedural algorithm using these two methods has been proposed in order to overcome the problem of missing values. Another issue we paid attention to in this thesis was modelling of multi-class data sets in which the severe imbalance class samples distribution exists. The imbalanced data affect the performance of classifiers during the classification process. We have shown that as long as we understand how class members are constructed in dimensional space in each cluster we can reform the distribution and provide more knowledge domain for the classifier. 2010-03-16T10:20:27Z 2010-03-16T10:20:27Z 2010-03-16T10:20:27Z 2008 Thesis doctoral PhD http://hdl.handle.net/10454/4262 en <a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/3.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-nd/3.0/88x31.png" /></a><br />The University of Bradford theses are licenced under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/3.0/">Creative Commons Licence</a>. University of Bradford School of Informatics
collection NDLTD
language en
sources NDLTD
topic Predictive toxicology
Toxicity data
Pesticides
Artificial intelligence
Data quality
Data generation
Model performance
QSAR
Classification algorithm
Clustering
Imbalanced dataset
Endpoints
spellingShingle Predictive toxicology
Toxicity data
Pesticides
Artificial intelligence
Data quality
Data generation
Model performance
QSAR
Classification algorithm
Clustering
Imbalanced dataset
Endpoints
Malazizi, Ladan
Development of Artificial Intelligence-based In-Silico Toxicity Models. Data Quality Analysis and Model Performance Enhancement through Data Generation.
description Toxic compounds, such as pesticides, are routinely tested against a range of aquatic, avian and mammalian species as part of the registration process. The need for reducing dependence on animal testing has led to an increasing interest in alternative methods such as in silico modelling. The QSAR (Quantitative Structure Activity Relationship)-based models are already in use for predicting physicochemical properties, environmental fate, eco-toxicological effects, and specific biological endpoints for a wide range of chemicals. Data plays an important role in modelling QSARs and also in result analysis for toxicity testing processes. This research addresses number of issues in predictive toxicology. One issue is the problem of data quality. Although large amount of toxicity data is available from online sources, this data may contain some unreliable samples and may be defined as of low quality. Its presentation also might not be consistent throughout different sources and that makes the access, interpretation and comparison of the information difficult. To address this issue we started with detailed investigation and experimental work on DEMETRA data. The DEMETRA datasets have been produced by the EC-funded project DEMETRA. Based on the investigation, experiments and the results obtained, the author identified a number of data quality criteria in order to provide a solution for data evaluation in toxicology domain. An algorithm has also been proposed to assess data quality before modelling. Another issue considered in the thesis was the missing values in datasets for toxicology domain. Least Square Method for a paired dataset and Serial Correlation for single version dataset provided the solution for the problem in two different situations. A procedural algorithm using these two methods has been proposed in order to overcome the problem of missing values. Another issue we paid attention to in this thesis was modelling of multi-class data sets in which the severe imbalance class samples distribution exists. The imbalanced data affect the performance of classifiers during the classification process. We have shown that as long as we understand how class members are constructed in dimensional space in each cluster we can reform the distribution and provide more knowledge domain for the classifier.
author2 Neagu, Daniel
author_facet Neagu, Daniel
Malazizi, Ladan
author Malazizi, Ladan
author_sort Malazizi, Ladan
title Development of Artificial Intelligence-based In-Silico Toxicity Models. Data Quality Analysis and Model Performance Enhancement through Data Generation.
title_short Development of Artificial Intelligence-based In-Silico Toxicity Models. Data Quality Analysis and Model Performance Enhancement through Data Generation.
title_full Development of Artificial Intelligence-based In-Silico Toxicity Models. Data Quality Analysis and Model Performance Enhancement through Data Generation.
title_fullStr Development of Artificial Intelligence-based In-Silico Toxicity Models. Data Quality Analysis and Model Performance Enhancement through Data Generation.
title_full_unstemmed Development of Artificial Intelligence-based In-Silico Toxicity Models. Data Quality Analysis and Model Performance Enhancement through Data Generation.
title_sort development of artificial intelligence-based in-silico toxicity models. data quality analysis and model performance enhancement through data generation.
publisher University of Bradford
publishDate 2010
url http://hdl.handle.net/10454/4262
work_keys_str_mv AT malaziziladan developmentofartificialintelligencebasedinsilicotoxicitymodelsdataqualityanalysisandmodelperformanceenhancementthroughdatageneration
_version_ 1719255399387889664