Automated Training for Algorithms That Learn from Genomic Data
Supervised machine learning algorithms are used by life scientists for a variety of objectives. Expert-curated public gene and protein databases are major resources for gathering data to train these algorithms. While these data resources are continuously updated, generally, these updates are not in...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2015-01-01
|
Series: | BioMed Research International |
Online Access: | http://dx.doi.org/10.1155/2015/234236 |
id |
doaj-f96c2e16c8e84e91bba71fc670266d97 |
---|---|
record_format |
Article |
spelling |
doaj-f96c2e16c8e84e91bba71fc670266d972020-11-24T22:19:40ZengHindawi LimitedBioMed Research International2314-61332314-61412015-01-01201510.1155/2015/234236234236Automated Training for Algorithms That Learn from Genomic DataGokcen Cilingir0Shira L. Broschat1School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164, USASchool of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164, USASupervised machine learning algorithms are used by life scientists for a variety of objectives. Expert-curated public gene and protein databases are major resources for gathering data to train these algorithms. While these data resources are continuously updated, generally, these updates are not incorporated into published machine learning algorithms which thereby can become outdated soon after their introduction. In this paper, we propose a new model of operation for supervised machine learning algorithms that learn from genomic data. By defining these algorithms in a pipeline in which the training data gathering procedure and the learning process are automated, one can create a system that generates a classifier or predictor using information available from public resources. The proposed model is explained using three case studies on SignalP, MemLoci, and ApicoAP in which existing machine learning models are utilized in pipelines. Given that the vast majority of the procedures described for gathering training data can easily be automated, it is possible to transform valuable machine learning algorithms into self-evolving learners that benefit from the ever-changing data available for gene products and to develop new machine learning algorithms that are similarly capable.http://dx.doi.org/10.1155/2015/234236 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Gokcen Cilingir Shira L. Broschat |
spellingShingle |
Gokcen Cilingir Shira L. Broschat Automated Training for Algorithms That Learn from Genomic Data BioMed Research International |
author_facet |
Gokcen Cilingir Shira L. Broschat |
author_sort |
Gokcen Cilingir |
title |
Automated Training for Algorithms That Learn from Genomic Data |
title_short |
Automated Training for Algorithms That Learn from Genomic Data |
title_full |
Automated Training for Algorithms That Learn from Genomic Data |
title_fullStr |
Automated Training for Algorithms That Learn from Genomic Data |
title_full_unstemmed |
Automated Training for Algorithms That Learn from Genomic Data |
title_sort |
automated training for algorithms that learn from genomic data |
publisher |
Hindawi Limited |
series |
BioMed Research International |
issn |
2314-6133 2314-6141 |
publishDate |
2015-01-01 |
description |
Supervised machine learning algorithms are used by life scientists for a variety of objectives.
Expert-curated public gene and protein databases are major resources for gathering data to
train these algorithms. While these data resources are continuously updated, generally, these
updates are not incorporated into published machine learning algorithms which thereby can
become outdated soon after their introduction. In this paper, we propose a new model of
operation for supervised machine learning algorithms that learn from genomic data. By defining
these algorithms in a pipeline in which the training data gathering procedure and the learning
process are automated, one can create a system that generates a classifier or predictor using
information available from public resources. The proposed model is explained using three case
studies on SignalP, MemLoci, and ApicoAP in which existing machine learning models are
utilized in pipelines. Given that the vast majority of the procedures described for gathering
training data can easily be automated, it is possible to transform valuable machine learning
algorithms into self-evolving learners that benefit from the ever-changing data available for
gene products and to develop new machine learning algorithms that are similarly capable. |
url |
http://dx.doi.org/10.1155/2015/234236 |
work_keys_str_mv |
AT gokcencilingir automatedtrainingforalgorithmsthatlearnfromgenomicdata AT shiralbroschat automatedtrainingforalgorithmsthatlearnfromgenomicdata |
_version_ |
1725778105388236800 |