A Statistical Similarity/Dissimilarity Analysis of Protein Sequences Based on a Novel Group Representative Vector

Similarity/dissimilarity analysis is a key way of understanding the biology of an organism by knowing the origin of the new genes/sequences. Sequence data are grouped in terms of biological relationships. The number of sequences related to any group is susceptible to be increased every day. All the...

Full description

Bibliographic Details
Main Authors: Marwa A. Abd Elwahaab, Mervat M. Abo-Elkhier, Moheb I. Abo el Maaty
Format: Article
Language:English
Published: Hindawi Limited 2019-01-01
Series:BioMed Research International
Online Access:http://dx.doi.org/10.1155/2019/8702968
Description
Summary:Similarity/dissimilarity analysis is a key way of understanding the biology of an organism by knowing the origin of the new genes/sequences. Sequence data are grouped in terms of biological relationships. The number of sequences related to any group is susceptible to be increased every day. All the present alignment-free methods approve the utility of their approaches by producing a similarity/dissimilarity matrix. Although this matrix is clear, it measures the degree of similarity among sequences individually. In our work, a representative of each of three groups of protein sequences is introduced. A similarity/dissimilarity vector is evaluated instead of the ordinary similarity/dissimilarity matrix based on the group representative. The approach is applied on three selected groups of protein sequences: beta globin, NADH dehydrogenase subunit 5 (ND5), and spike protein sequences. A cross-grouping comparison is produced to ensure the singularity of each group. A qualitative comparison between our approach, previous articles, and the phylogenetic tree of these protein sequences proved the utility of our approach.
ISSN:2314-6133
2314-6141