Automated Training for Algorithms That Learn from Genomic Data

Supervised machine learning algorithms are used by life scientists for a variety of objectives. Expert-curated public gene and protein databases are major resources for gathering data to train these algorithms. While these data resources are continuously updated, generally, these updates are not in...

Full description

Bibliographic Details
Main Authors: Gokcen Cilingir, Shira L. Broschat
Format: Article
Language:English
Published: Hindawi Limited 2015-01-01
Series:BioMed Research International
Online Access:http://dx.doi.org/10.1155/2015/234236
id doaj-f96c2e16c8e84e91bba71fc670266d97
record_format Article
spelling doaj-f96c2e16c8e84e91bba71fc670266d972020-11-24T22:19:40ZengHindawi LimitedBioMed Research International2314-61332314-61412015-01-01201510.1155/2015/234236234236Automated Training for Algorithms That Learn from Genomic DataGokcen Cilingir0Shira L. Broschat1School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164, USASchool of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164, USASupervised machine learning algorithms are used by life scientists for a variety of objectives. Expert-curated public gene and protein databases are major resources for gathering data to train these algorithms. While these data resources are continuously updated, generally, these updates are not incorporated into published machine learning algorithms which thereby can become outdated soon after their introduction. In this paper, we propose a new model of operation for supervised machine learning algorithms that learn from genomic data. By defining these algorithms in a pipeline in which the training data gathering procedure and the learning process are automated, one can create a system that generates a classifier or predictor using information available from public resources. The proposed model is explained using three case studies on SignalP, MemLoci, and ApicoAP in which existing machine learning models are utilized in pipelines. Given that the vast majority of the procedures described for gathering training data can easily be automated, it is possible to transform valuable machine learning algorithms into self-evolving learners that benefit from the ever-changing data available for gene products and to develop new machine learning algorithms that are similarly capable.http://dx.doi.org/10.1155/2015/234236
collection DOAJ
language English
format Article
sources DOAJ
author Gokcen Cilingir
Shira L. Broschat
spellingShingle Gokcen Cilingir
Shira L. Broschat
Automated Training for Algorithms That Learn from Genomic Data
BioMed Research International
author_facet Gokcen Cilingir
Shira L. Broschat
author_sort Gokcen Cilingir
title Automated Training for Algorithms That Learn from Genomic Data
title_short Automated Training for Algorithms That Learn from Genomic Data
title_full Automated Training for Algorithms That Learn from Genomic Data
title_fullStr Automated Training for Algorithms That Learn from Genomic Data
title_full_unstemmed Automated Training for Algorithms That Learn from Genomic Data
title_sort automated training for algorithms that learn from genomic data
publisher Hindawi Limited
series BioMed Research International
issn 2314-6133
2314-6141
publishDate 2015-01-01
description Supervised machine learning algorithms are used by life scientists for a variety of objectives. Expert-curated public gene and protein databases are major resources for gathering data to train these algorithms. While these data resources are continuously updated, generally, these updates are not incorporated into published machine learning algorithms which thereby can become outdated soon after their introduction. In this paper, we propose a new model of operation for supervised machine learning algorithms that learn from genomic data. By defining these algorithms in a pipeline in which the training data gathering procedure and the learning process are automated, one can create a system that generates a classifier or predictor using information available from public resources. The proposed model is explained using three case studies on SignalP, MemLoci, and ApicoAP in which existing machine learning models are utilized in pipelines. Given that the vast majority of the procedures described for gathering training data can easily be automated, it is possible to transform valuable machine learning algorithms into self-evolving learners that benefit from the ever-changing data available for gene products and to develop new machine learning algorithms that are similarly capable.
url http://dx.doi.org/10.1155/2015/234236
work_keys_str_mv AT gokcencilingir automatedtrainingforalgorithmsthatlearnfromgenomicdata
AT shiralbroschat automatedtrainingforalgorithmsthatlearnfromgenomicdata
_version_ 1725778105388236800