Measuring Similarity among Protein Sequences Using a New Descriptor

The comparison of protein sequences according to similarity is a fundamental aspect of today’s biomedical research. With the developments of sequencing technologies, a large number of protein sequences increase exponentially in the public databases. Famous sequences’ comparison methods are alignment...

Full description

Bibliographic Details
Main Authors: Mervat M. Abo-Elkhier, Marwa A. Abd Elwahaab, Moheb I. Abo El Maaty
Format: Article
Language:English
Published: Hindawi Limited 2019-01-01
Series:BioMed Research International
Online Access:http://dx.doi.org/10.1155/2019/2796971
id doaj-62188dfe9e814689a9f859fa494f341f
record_format Article
spelling doaj-62188dfe9e814689a9f859fa494f341f2020-11-25T02:31:04ZengHindawi LimitedBioMed Research International2314-61332314-61412019-01-01201910.1155/2019/27969712796971Measuring Similarity among Protein Sequences Using a New DescriptorMervat M. Abo-Elkhier0Marwa A. Abd Elwahaab1Moheb I. Abo El Maaty2Department of Engineering Mathematics and Physics, Faculty of Engineering, Mansoura University, Mansoura 35516, EgyptDepartment of Engineering Mathematics and Physics, Faculty of Engineering, Mansoura University, Mansoura 35516, EgyptDepartment of Engineering Mathematics and Physics, Faculty of Engineering, Mansoura University, Mansoura 35516, EgyptThe comparison of protein sequences according to similarity is a fundamental aspect of today’s biomedical research. With the developments of sequencing technologies, a large number of protein sequences increase exponentially in the public databases. Famous sequences’ comparison methods are alignment based. They generally give excellent results when the sequences under study are closely related and they are time consuming. Herein, a new alignment-free method is introduced. Our technique depends on a new graphical representation and descriptor. The graphical representation of protein sequence is a simple way to visualize protein sequences. The descriptor compresses the primary sequence into a single vector composed of only two values. Our approach gives good results with both short and long sequences within a little computation time. It is applied on nine beta globin, nine ND5 (NADH dehydrogenase subunit 5), and 24 spike protein sequences. Correlation and significance analyses are also introduced to compare our similarity/dissimilarity results with others’ approaches, results, and sequence homology.http://dx.doi.org/10.1155/2019/2796971
collection DOAJ
language English
format Article
sources DOAJ
author Mervat M. Abo-Elkhier
Marwa A. Abd Elwahaab
Moheb I. Abo El Maaty
spellingShingle Mervat M. Abo-Elkhier
Marwa A. Abd Elwahaab
Moheb I. Abo El Maaty
Measuring Similarity among Protein Sequences Using a New Descriptor
BioMed Research International
author_facet Mervat M. Abo-Elkhier
Marwa A. Abd Elwahaab
Moheb I. Abo El Maaty
author_sort Mervat M. Abo-Elkhier
title Measuring Similarity among Protein Sequences Using a New Descriptor
title_short Measuring Similarity among Protein Sequences Using a New Descriptor
title_full Measuring Similarity among Protein Sequences Using a New Descriptor
title_fullStr Measuring Similarity among Protein Sequences Using a New Descriptor
title_full_unstemmed Measuring Similarity among Protein Sequences Using a New Descriptor
title_sort measuring similarity among protein sequences using a new descriptor
publisher Hindawi Limited
series BioMed Research International
issn 2314-6133
2314-6141
publishDate 2019-01-01
description The comparison of protein sequences according to similarity is a fundamental aspect of today’s biomedical research. With the developments of sequencing technologies, a large number of protein sequences increase exponentially in the public databases. Famous sequences’ comparison methods are alignment based. They generally give excellent results when the sequences under study are closely related and they are time consuming. Herein, a new alignment-free method is introduced. Our technique depends on a new graphical representation and descriptor. The graphical representation of protein sequence is a simple way to visualize protein sequences. The descriptor compresses the primary sequence into a single vector composed of only two values. Our approach gives good results with both short and long sequences within a little computation time. It is applied on nine beta globin, nine ND5 (NADH dehydrogenase subunit 5), and 24 spike protein sequences. Correlation and significance analyses are also introduced to compare our similarity/dissimilarity results with others’ approaches, results, and sequence homology.
url http://dx.doi.org/10.1155/2019/2796971
work_keys_str_mv AT mervatmaboelkhier measuringsimilarityamongproteinsequencesusinganewdescriptor
AT marwaaabdelwahaab measuringsimilarityamongproteinsequencesusinganewdescriptor
AT mohebiaboelmaaty measuringsimilarityamongproteinsequencesusinganewdescriptor
_version_ 1724825477949947904