Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA
An effective representation of a protein sequence plays a crucial role in protein sub-nuclear localization. The existing representations, such as dipeptide composition (DipC), pseudo-amino acid composition (PseAAC) and position specific scoring matrix (PSSM), are insufficient to represent protein se...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2015-12-01
|
Series: | International Journal of Molecular Sciences |
Subjects: | |
Online Access: | http://www.mdpi.com/1422-0067/16/12/26237 |
id |
doaj-0c1d3f42bde44bb48e383b53ddc81418 |
---|---|
record_format |
Article |
spelling |
doaj-0c1d3f42bde44bb48e383b53ddc814182020-11-24T21:39:45ZengMDPI AGInternational Journal of Molecular Sciences1422-00672015-12-011612303433036110.3390/ijms161226237ijms161226237Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDAShunfang Wang0Shuhui Liu1School of Information Science and Engineering, Yunnan University, Kunming 650504, ChinaSchool of Information Science and Engineering, Yunnan University, Kunming 650504, ChinaAn effective representation of a protein sequence plays a crucial role in protein sub-nuclear localization. The existing representations, such as dipeptide composition (DipC), pseudo-amino acid composition (PseAAC) and position specific scoring matrix (PSSM), are insufficient to represent protein sequence due to their single perspectives. Thus, this paper proposes two fusion feature representations of DipPSSM and PseAAPSSM to integrate PSSM with DipC and PseAAC, respectively. When constructing each fusion representation, we introduce the balance factors to value the importance of its components. The optimal values of the balance factors are sought by genetic algorithm. Due to the high dimensionality of the proposed representations, linear discriminant analysis (LDA) is used to find its important low dimensional structure, which is essential for classification and location prediction. The numerical experiments on two public datasets with KNN classifier and cross-validation tests showed that in terms of the common indexes of sensitivity, specificity, accuracy and MCC, the proposed fusing representations outperform the traditional representations in protein sub-nuclear localization, and the representation treated by LDA outperforms the untreated one.http://www.mdpi.com/1422-0067/16/12/26237protein sub-nuclear localizationDipPSSMPseAAPSSMlinear discriminant analysisKNN classifier |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Shunfang Wang Shuhui Liu |
spellingShingle |
Shunfang Wang Shuhui Liu Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA International Journal of Molecular Sciences protein sub-nuclear localization DipPSSM PseAAPSSM linear discriminant analysis KNN classifier |
author_facet |
Shunfang Wang Shuhui Liu |
author_sort |
Shunfang Wang |
title |
Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA |
title_short |
Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA |
title_full |
Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA |
title_fullStr |
Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA |
title_full_unstemmed |
Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA |
title_sort |
protein sub-nuclear localization based on effective fusion representations and dimension reduction algorithm lda |
publisher |
MDPI AG |
series |
International Journal of Molecular Sciences |
issn |
1422-0067 |
publishDate |
2015-12-01 |
description |
An effective representation of a protein sequence plays a crucial role in protein sub-nuclear localization. The existing representations, such as dipeptide composition (DipC), pseudo-amino acid composition (PseAAC) and position specific scoring matrix (PSSM), are insufficient to represent protein sequence due to their single perspectives. Thus, this paper proposes two fusion feature representations of DipPSSM and PseAAPSSM to integrate PSSM with DipC and PseAAC, respectively. When constructing each fusion representation, we introduce the balance factors to value the importance of its components. The optimal values of the balance factors are sought by genetic algorithm. Due to the high dimensionality of the proposed representations, linear discriminant analysis (LDA) is used to find its important low dimensional structure, which is essential for classification and location prediction. The numerical experiments on two public datasets with KNN classifier and cross-validation tests showed that in terms of the common indexes of sensitivity, specificity, accuracy and MCC, the proposed fusing representations outperform the traditional representations in protein sub-nuclear localization, and the representation treated by LDA outperforms the untreated one. |
topic |
protein sub-nuclear localization DipPSSM PseAAPSSM linear discriminant analysis KNN classifier |
url |
http://www.mdpi.com/1422-0067/16/12/26237 |
work_keys_str_mv |
AT shunfangwang proteinsubnuclearlocalizationbasedoneffectivefusionrepresentationsanddimensionreductionalgorithmlda AT shuhuiliu proteinsubnuclearlocalizationbasedoneffectivefusionrepresentationsanddimensionreductionalgorithmlda |
_version_ |
1725929498483884032 |