Effective Feature Selection for Classification of Promoter Sequences.

Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of ex...

Full description

Bibliographic Details
Main Authors: Kouser K, Lavanya P G, Lalitha Rangarajan, Acharya Kshitish K
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2016-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC5158321?pdf=render
id doaj-34c4b799166a496cacba18d403c78a1d
record_format Article
spelling doaj-34c4b799166a496cacba18d403c78a1d2020-11-25T01:41:53ZengPublic Library of Science (PLoS)PLoS ONE1932-62032016-01-011112e016716510.1371/journal.pone.0167165Effective Feature Selection for Classification of Promoter Sequences.Kouser KLavanya P GLalitha RangarajanAcharya Kshitish KExploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.http://europepmc.org/articles/PMC5158321?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Kouser K
Lavanya P G
Lalitha Rangarajan
Acharya Kshitish K
spellingShingle Kouser K
Lavanya P G
Lalitha Rangarajan
Acharya Kshitish K
Effective Feature Selection for Classification of Promoter Sequences.
PLoS ONE
author_facet Kouser K
Lavanya P G
Lalitha Rangarajan
Acharya Kshitish K
author_sort Kouser K
title Effective Feature Selection for Classification of Promoter Sequences.
title_short Effective Feature Selection for Classification of Promoter Sequences.
title_full Effective Feature Selection for Classification of Promoter Sequences.
title_fullStr Effective Feature Selection for Classification of Promoter Sequences.
title_full_unstemmed Effective Feature Selection for Classification of Promoter Sequences.
title_sort effective feature selection for classification of promoter sequences.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2016-01-01
description Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.
url http://europepmc.org/articles/PMC5158321?pdf=render
work_keys_str_mv AT kouserk effectivefeatureselectionforclassificationofpromotersequences
AT lavanyapg effectivefeatureselectionforclassificationofpromotersequences
AT lalitharangarajan effectivefeatureselectionforclassificationofpromotersequences
AT acharyakshitishk effectivefeatureselectionforclassificationofpromotersequences
_version_ 1725039146377936896