Interpol: An R package for preprocessing of protein sequences

Abstract Background Most machine learning techniques currently applied in the literature need a fixed dimensionality of input data. However, this requirement is frequently violated by real input data, such as DNA and protein sequences, that often differ...

Full description

Bibliographic Details
Main Authors:	Heider Dominik, Hoffmann Daniel
Format:	Article
Language:	English
Published:	BMC 2011-06-01
Series:	BioData Mining
Online Access:	http://www.biodatamining.org/content/4/1/16

id	doaj-a979035e72cc4677b263cb02cb06a239
record_format	Article
spelling	doaj-a979035e72cc4677b263cb02cb06a2392020-11-24T23:56:30ZengBMCBioData Mining1756-03812011-06-01411610.1186/1756-0381-4-16Interpol: An R package for preprocessing of protein sequencesHeider DominikHoffmann Daniel<p>Abstract</p> <p>Background</p> <p>Most machine learning techniques currently applied in the literature need a fixed dimensionality of input data. However, this requirement is frequently violated by real input data, such as DNA and protein sequences, that often differ in length due to insertions and deletions. It is also notable that performance in classification and regression is often improved by numerical encoding of amino acids, compared to the commonly used sparse encoding.</p> <p>Results</p> <p>The software "Interpol" encodes amino acid sequences as numerical descriptor vectors using a database of currently 532 descriptors (mainly from AAindex), and normalizes sequences to uniform length with one of five linear or non-linear interpolation algorithms. Interpol is distributed with open source as platform independent R-package. It is typically used for preprocessing of amino acid sequences for classification or regression.</p> <p>Conclusions</p> <p>The functionality of Interpol widens the spectrum of machine learning methods that can be applied to biological sequences, and it will in many cases improve their performance in classification and regression.</p> http://www.biodatamining.org/content/4/1/16
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Heider Dominik Hoffmann Daniel
spellingShingle	Heider Dominik Hoffmann Daniel Interpol: An R package for preprocessing of protein sequences BioData Mining
author_facet	Heider Dominik Hoffmann Daniel
author_sort	Heider Dominik
title	Interpol: An R package for preprocessing of protein sequences
title_short	Interpol: An R package for preprocessing of protein sequences
title_full	Interpol: An R package for preprocessing of protein sequences
title_fullStr	Interpol: An R package for preprocessing of protein sequences
title_full_unstemmed	Interpol: An R package for preprocessing of protein sequences
title_sort	interpol: an r package for preprocessing of protein sequences
publisher	BMC
series	BioData Mining
issn	1756-0381
publishDate	2011-06-01
description	<p>Abstract</p> <p>Background</p> <p>Most machine learning techniques currently applied in the literature need a fixed dimensionality of input data. However, this requirement is frequently violated by real input data, such as DNA and protein sequences, that often differ in length due to insertions and deletions. It is also notable that performance in classification and regression is often improved by numerical encoding of amino acids, compared to the commonly used sparse encoding.</p> <p>Results</p> <p>The software "Interpol" encodes amino acid sequences as numerical descriptor vectors using a database of currently 532 descriptors (mainly from AAindex), and normalizes sequences to uniform length with one of five linear or non-linear interpolation algorithms. Interpol is distributed with open source as platform independent R-package. It is typically used for preprocessing of amino acid sequences for classification or regression.</p> <p>Conclusions</p> <p>The functionality of Interpol widens the spectrum of machine learning methods that can be applied to biological sequences, and it will in many cases improve their performance in classification and regression.</p>
url	http://www.biodatamining.org/content/4/1/16
work_keys_str_mv	AT heiderdominik interpolanrpackageforpreprocessingofproteinsequences AT hoffmanndaniel interpolanrpackageforpreprocessingofproteinsequences
_version_	1725458106162348032

Interpol: An R package for preprocessing of protein sequences

Similar Items