An Efficient-Assembler Whale Optimization Algorithm for DNA Fragment Assembly Problem: Analysis and Validations
The study of deoxyribonucleic acid (DNA) is crucial in many fields, including medicine, biology, zoology, agriculture, and forensics. Since reading a DNA sequence is onerous because of its massive length, it is common in many DNA analysis applications to divide DNA strands into small segments or fra...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9293303/ |
id |
doaj-9c054b8573554d30bf51bf774bc83ac2 |
---|---|
record_format |
Article |
spelling |
doaj-9c054b8573554d30bf51bf774bc83ac22021-03-30T04:30:44ZengIEEEIEEE Access2169-35362020-01-01822214422216710.1109/ACCESS.2020.30448579293303An Efficient-Assembler Whale Optimization Algorithm for DNA Fragment Assembly Problem: Analysis and ValidationsMohamed Abdel-Basset0https://orcid.org/0000-0002-2794-3936Reda Mohamed1https://orcid.org/0000-0002-1903-4062Karam M. Sallam2https://orcid.org/0000-0003-4039-1897Ripon K. Chakrabortty3https://orcid.org/0000-0002-7373-0149Michael J. Ryan4https://orcid.org/0000-0002-6335-3773Faculty of Computers and Informatics, Zagazig University, Zagazig, EgyptFaculty of Computers and Informatics, Zagazig University, Zagazig, EgyptFaculty of Computers and Informatics, Zagazig University, Zagazig, EgyptCapability Systems Centre, School of Engineering and Information Technology, UNSW Canberra at the Australian Defence Force Academy, Campbell, ACT, AustraliaCapability Systems Centre, School of Engineering and Information Technology, UNSW Canberra at the Australian Defence Force Academy, Campbell, ACT, AustraliaThe study of deoxyribonucleic acid (DNA) is crucial in many fields, including medicine, biology, zoology, agriculture, and forensics. Since reading a DNA sequence is onerous because of its massive length, it is common in many DNA analysis applications to divide DNA strands into small segments or fragments which, after analysis, must be reassembled. Since this reassembly takes a non-specific polynomial time to solve, the DNA fragment assembly problem (DFAP) is NP-hard. This paper proposes a new assembler for tackling the DFAP based on the overlap-layout-consensus (OLC) approach. The proposed assembler adapts a discrete whale optimization algorithm (DWOA) using standard operators adopted from evolutionary algorithms to simulate the strategy adopted by humpback whales when searching for prey. For the first time, we formulate the behaviors of whales to be applied directly to any discrete optimization problem based on three primary operations: a swap-based best-position operator, an ordered crossover operator, and selection of a random whale operation to perform the exploitation and exploration phases of the algorithm. These operations were carefully designed to preserve the methodology of the original whale algorithm. DFAP is a multi-objective problem that seeks to reach the optimal order of segments that maximizes the overlap score and minimizes the number of contigs (set of overlapping DNA segments) to compose a one-contig DNA strand. Existing local search methods, such as problem aware local search (PALS) many non-conflicting movements (PALS2-many), suffer from being trapped in local optima. Hence, the integration of DWOA with PALS2-many improves the search capability for finding the optimal order of fragments. In addition, we propose a new variation of PALS2-many that achieves simultaneously the two objectives of DFAP. Our proposed DWOA was compared with a number of the most recent robust assemblers: a hybrid crow search algorithm for solving the DFAP (CSA-P2M*Fit), P2M*Fit, and a hybrid genetic algorithm (GA-P2M*Fit). The experimental results and statistical analyses of the proposed DWOA on thirty benchmark instances show that DWOA significantly outperforms those algorithms in reaching fewer contigs, in addition to being competitive with CSA-P2M*Fit and superior to P2M*Fit and GA-P2M*Fit for the overlap score.https://ieeexplore.ieee.org/document/9293303/DNA sequenceDNA fragments assembly problemoverlap-layout-consensuswhale optimization algorithm |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Mohamed Abdel-Basset Reda Mohamed Karam M. Sallam Ripon K. Chakrabortty Michael J. Ryan |
spellingShingle |
Mohamed Abdel-Basset Reda Mohamed Karam M. Sallam Ripon K. Chakrabortty Michael J. Ryan An Efficient-Assembler Whale Optimization Algorithm for DNA Fragment Assembly Problem: Analysis and Validations IEEE Access DNA sequence DNA fragments assembly problem overlap-layout-consensus whale optimization algorithm |
author_facet |
Mohamed Abdel-Basset Reda Mohamed Karam M. Sallam Ripon K. Chakrabortty Michael J. Ryan |
author_sort |
Mohamed Abdel-Basset |
title |
An Efficient-Assembler Whale Optimization Algorithm for DNA Fragment Assembly Problem: Analysis and Validations |
title_short |
An Efficient-Assembler Whale Optimization Algorithm for DNA Fragment Assembly Problem: Analysis and Validations |
title_full |
An Efficient-Assembler Whale Optimization Algorithm for DNA Fragment Assembly Problem: Analysis and Validations |
title_fullStr |
An Efficient-Assembler Whale Optimization Algorithm for DNA Fragment Assembly Problem: Analysis and Validations |
title_full_unstemmed |
An Efficient-Assembler Whale Optimization Algorithm for DNA Fragment Assembly Problem: Analysis and Validations |
title_sort |
efficient-assembler whale optimization algorithm for dna fragment assembly problem: analysis and validations |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
The study of deoxyribonucleic acid (DNA) is crucial in many fields, including medicine, biology, zoology, agriculture, and forensics. Since reading a DNA sequence is onerous because of its massive length, it is common in many DNA analysis applications to divide DNA strands into small segments or fragments which, after analysis, must be reassembled. Since this reassembly takes a non-specific polynomial time to solve, the DNA fragment assembly problem (DFAP) is NP-hard. This paper proposes a new assembler for tackling the DFAP based on the overlap-layout-consensus (OLC) approach. The proposed assembler adapts a discrete whale optimization algorithm (DWOA) using standard operators adopted from evolutionary algorithms to simulate the strategy adopted by humpback whales when searching for prey. For the first time, we formulate the behaviors of whales to be applied directly to any discrete optimization problem based on three primary operations: a swap-based best-position operator, an ordered crossover operator, and selection of a random whale operation to perform the exploitation and exploration phases of the algorithm. These operations were carefully designed to preserve the methodology of the original whale algorithm. DFAP is a multi-objective problem that seeks to reach the optimal order of segments that maximizes the overlap score and minimizes the number of contigs (set of overlapping DNA segments) to compose a one-contig DNA strand. Existing local search methods, such as problem aware local search (PALS) many non-conflicting movements (PALS2-many), suffer from being trapped in local optima. Hence, the integration of DWOA with PALS2-many improves the search capability for finding the optimal order of fragments. In addition, we propose a new variation of PALS2-many that achieves simultaneously the two objectives of DFAP. Our proposed DWOA was compared with a number of the most recent robust assemblers: a hybrid crow search algorithm for solving the DFAP (CSA-P2M*Fit), P2M*Fit, and a hybrid genetic algorithm (GA-P2M*Fit). The experimental results and statistical analyses of the proposed DWOA on thirty benchmark instances show that DWOA significantly outperforms those algorithms in reaching fewer contigs, in addition to being competitive with CSA-P2M*Fit and superior to P2M*Fit and GA-P2M*Fit for the overlap score. |
topic |
DNA sequence DNA fragments assembly problem overlap-layout-consensus whale optimization algorithm |
url |
https://ieeexplore.ieee.org/document/9293303/ |
work_keys_str_mv |
AT mohamedabdelbasset anefficientassemblerwhaleoptimizationalgorithmfordnafragmentassemblyproblemanalysisandvalidations AT redamohamed anefficientassemblerwhaleoptimizationalgorithmfordnafragmentassemblyproblemanalysisandvalidations AT karammsallam anefficientassemblerwhaleoptimizationalgorithmfordnafragmentassemblyproblemanalysisandvalidations AT riponkchakrabortty anefficientassemblerwhaleoptimizationalgorithmfordnafragmentassemblyproblemanalysisandvalidations AT michaeljryan anefficientassemblerwhaleoptimizationalgorithmfordnafragmentassemblyproblemanalysisandvalidations AT mohamedabdelbasset efficientassemblerwhaleoptimizationalgorithmfordnafragmentassemblyproblemanalysisandvalidations AT redamohamed efficientassemblerwhaleoptimizationalgorithmfordnafragmentassemblyproblemanalysisandvalidations AT karammsallam efficientassemblerwhaleoptimizationalgorithmfordnafragmentassemblyproblemanalysisandvalidations AT riponkchakrabortty efficientassemblerwhaleoptimizationalgorithmfordnafragmentassemblyproblemanalysisandvalidations AT michaeljryan efficientassemblerwhaleoptimizationalgorithmfordnafragmentassemblyproblemanalysisandvalidations |
_version_ |
1724181679492300800 |