SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform

Abstract Background Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts. Results A new alignment-free sequence similarity analysis method, called SSAW is proposed. SSAW stands for Sequence Similarity Analysis u...

Full description

Bibliographic Details
Main Authors: Jie Lin, Jing Wei, Donald Adjeroh, Bing-Hua Jiang, Yue Jiang
Format: Article
Language:English
Published: BMC 2018-05-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-018-2155-9
id doaj-fa78399b24664885b918fee698f4294d
record_format Article
spelling doaj-fa78399b24664885b918fee698f4294d2020-11-24T21:49:15ZengBMCBMC Bioinformatics1471-21052018-05-0119111110.1186/s12859-018-2155-9SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transformJie Lin0Jing Wei1Donald Adjeroh2Bing-Hua Jiang3Yue Jiang4College of Mathematics and Informatics, Fujian Normal UniversityCollege of Mathematics and Informatics, Fujian Normal UniversityLane Department of Computer Science and Electrical Engineering, West Virginia UniversityDepartment of Pathology, University of IowaCollege of Mathematics and Informatics, Fujian Normal UniversityAbstract Background Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts. Results A new alignment-free sequence similarity analysis method, called SSAW is proposed. SSAW stands for Sequence Similarity Analysis using the Stationary Discrete Wavelet Transform (SDWT). It extracts k-mers from a sequence, then maps each k-mer to a complex number field. Then, the series of complex numbers formed are transformed into feature vectors using the stationary discrete wavelet transform. After these steps, the original sequence is turned into a feature vector with numeric values, which can then be used for clustering and/or classification. Conclusions Using two different types of applications, namely, clustering and classification, we compared SSAW against the the-state-of-the-art alignment free sequence analysis methods. SSAW demonstrates competitive or superior performance in terms of standard indicators, such as accuracy, F-score, precision, and recall. The running time was significantly better in most cases. These make SSAW a suitable method for sequence analysis, especially, given the rapidly increasing volumes of sequence data required by most modern applications.http://link.springer.com/article/10.1186/s12859-018-2155-9k-mersWavelet transformComplex numbersSequence similarityFrequency domain
collection DOAJ
language English
format Article
sources DOAJ
author Jie Lin
Jing Wei
Donald Adjeroh
Bing-Hua Jiang
Yue Jiang
spellingShingle Jie Lin
Jing Wei
Donald Adjeroh
Bing-Hua Jiang
Yue Jiang
SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform
BMC Bioinformatics
k-mers
Wavelet transform
Complex numbers
Sequence similarity
Frequency domain
author_facet Jie Lin
Jing Wei
Donald Adjeroh
Bing-Hua Jiang
Yue Jiang
author_sort Jie Lin
title SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform
title_short SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform
title_full SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform
title_fullStr SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform
title_full_unstemmed SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform
title_sort ssaw: a new sequence similarity analysis method based on the stationary discrete wavelet transform
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2018-05-01
description Abstract Background Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts. Results A new alignment-free sequence similarity analysis method, called SSAW is proposed. SSAW stands for Sequence Similarity Analysis using the Stationary Discrete Wavelet Transform (SDWT). It extracts k-mers from a sequence, then maps each k-mer to a complex number field. Then, the series of complex numbers formed are transformed into feature vectors using the stationary discrete wavelet transform. After these steps, the original sequence is turned into a feature vector with numeric values, which can then be used for clustering and/or classification. Conclusions Using two different types of applications, namely, clustering and classification, we compared SSAW against the the-state-of-the-art alignment free sequence analysis methods. SSAW demonstrates competitive or superior performance in terms of standard indicators, such as accuracy, F-score, precision, and recall. The running time was significantly better in most cases. These make SSAW a suitable method for sequence analysis, especially, given the rapidly increasing volumes of sequence data required by most modern applications.
topic k-mers
Wavelet transform
Complex numbers
Sequence similarity
Frequency domain
url http://link.springer.com/article/10.1186/s12859-018-2155-9
work_keys_str_mv AT jielin ssawanewsequencesimilarityanalysismethodbasedonthestationarydiscretewavelettransform
AT jingwei ssawanewsequencesimilarityanalysismethodbasedonthestationarydiscretewavelettransform
AT donaldadjeroh ssawanewsequencesimilarityanalysismethodbasedonthestationarydiscretewavelettransform
AT binghuajiang ssawanewsequencesimilarityanalysismethodbasedonthestationarydiscretewavelettransform
AT yuejiang ssawanewsequencesimilarityanalysismethodbasedonthestationarydiscretewavelettransform
_version_ 1725888457842098176