SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform
Abstract Background Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts. Results A new alignment-free sequence similarity analysis method, called SSAW is proposed. SSAW stands for Sequence Similarity Analysis u...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2018-05-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-018-2155-9 |
id |
doaj-fa78399b24664885b918fee698f4294d |
---|---|
record_format |
Article |
spelling |
doaj-fa78399b24664885b918fee698f4294d2020-11-24T21:49:15ZengBMCBMC Bioinformatics1471-21052018-05-0119111110.1186/s12859-018-2155-9SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transformJie Lin0Jing Wei1Donald Adjeroh2Bing-Hua Jiang3Yue Jiang4College of Mathematics and Informatics, Fujian Normal UniversityCollege of Mathematics and Informatics, Fujian Normal UniversityLane Department of Computer Science and Electrical Engineering, West Virginia UniversityDepartment of Pathology, University of IowaCollege of Mathematics and Informatics, Fujian Normal UniversityAbstract Background Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts. Results A new alignment-free sequence similarity analysis method, called SSAW is proposed. SSAW stands for Sequence Similarity Analysis using the Stationary Discrete Wavelet Transform (SDWT). It extracts k-mers from a sequence, then maps each k-mer to a complex number field. Then, the series of complex numbers formed are transformed into feature vectors using the stationary discrete wavelet transform. After these steps, the original sequence is turned into a feature vector with numeric values, which can then be used for clustering and/or classification. Conclusions Using two different types of applications, namely, clustering and classification, we compared SSAW against the the-state-of-the-art alignment free sequence analysis methods. SSAW demonstrates competitive or superior performance in terms of standard indicators, such as accuracy, F-score, precision, and recall. The running time was significantly better in most cases. These make SSAW a suitable method for sequence analysis, especially, given the rapidly increasing volumes of sequence data required by most modern applications.http://link.springer.com/article/10.1186/s12859-018-2155-9k-mersWavelet transformComplex numbersSequence similarityFrequency domain |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jie Lin Jing Wei Donald Adjeroh Bing-Hua Jiang Yue Jiang |
spellingShingle |
Jie Lin Jing Wei Donald Adjeroh Bing-Hua Jiang Yue Jiang SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform BMC Bioinformatics k-mers Wavelet transform Complex numbers Sequence similarity Frequency domain |
author_facet |
Jie Lin Jing Wei Donald Adjeroh Bing-Hua Jiang Yue Jiang |
author_sort |
Jie Lin |
title |
SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform |
title_short |
SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform |
title_full |
SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform |
title_fullStr |
SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform |
title_full_unstemmed |
SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform |
title_sort |
ssaw: a new sequence similarity analysis method based on the stationary discrete wavelet transform |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2018-05-01 |
description |
Abstract Background Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts. Results A new alignment-free sequence similarity analysis method, called SSAW is proposed. SSAW stands for Sequence Similarity Analysis using the Stationary Discrete Wavelet Transform (SDWT). It extracts k-mers from a sequence, then maps each k-mer to a complex number field. Then, the series of complex numbers formed are transformed into feature vectors using the stationary discrete wavelet transform. After these steps, the original sequence is turned into a feature vector with numeric values, which can then be used for clustering and/or classification. Conclusions Using two different types of applications, namely, clustering and classification, we compared SSAW against the the-state-of-the-art alignment free sequence analysis methods. SSAW demonstrates competitive or superior performance in terms of standard indicators, such as accuracy, F-score, precision, and recall. The running time was significantly better in most cases. These make SSAW a suitable method for sequence analysis, especially, given the rapidly increasing volumes of sequence data required by most modern applications. |
topic |
k-mers Wavelet transform Complex numbers Sequence similarity Frequency domain |
url |
http://link.springer.com/article/10.1186/s12859-018-2155-9 |
work_keys_str_mv |
AT jielin ssawanewsequencesimilarityanalysismethodbasedonthestationarydiscretewavelettransform AT jingwei ssawanewsequencesimilarityanalysismethodbasedonthestationarydiscretewavelettransform AT donaldadjeroh ssawanewsequencesimilarityanalysismethodbasedonthestationarydiscretewavelettransform AT binghuajiang ssawanewsequencesimilarityanalysismethodbasedonthestationarydiscretewavelettransform AT yuejiang ssawanewsequencesimilarityanalysismethodbasedonthestationarydiscretewavelettransform |
_version_ |
1725888457842098176 |