An Efficient-Assembler Whale Optimization Algorithm for DNA Fragment Assembly Problem: Analysis and Validations

The study of deoxyribonucleic acid (DNA) is crucial in many fields, including medicine, biology, zoology, agriculture, and forensics. Since reading a DNA sequence is onerous because of its massive length, it is common in many DNA analysis applications to divide DNA strands into small segments or fra...

Full description

Bibliographic Details
Main Authors: Mohamed Abdel-Basset, Reda Mohamed, Karam M. Sallam, Ripon K. Chakrabortty, Michael J. Ryan
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9293303/
id doaj-9c054b8573554d30bf51bf774bc83ac2
record_format Article
spelling doaj-9c054b8573554d30bf51bf774bc83ac22021-03-30T04:30:44ZengIEEEIEEE Access2169-35362020-01-01822214422216710.1109/ACCESS.2020.30448579293303An Efficient-Assembler Whale Optimization Algorithm for DNA Fragment Assembly Problem: Analysis and ValidationsMohamed Abdel-Basset0https://orcid.org/0000-0002-2794-3936Reda Mohamed1https://orcid.org/0000-0002-1903-4062Karam M. Sallam2https://orcid.org/0000-0003-4039-1897Ripon K. Chakrabortty3https://orcid.org/0000-0002-7373-0149Michael J. Ryan4https://orcid.org/0000-0002-6335-3773Faculty of Computers and Informatics, Zagazig University, Zagazig, EgyptFaculty of Computers and Informatics, Zagazig University, Zagazig, EgyptFaculty of Computers and Informatics, Zagazig University, Zagazig, EgyptCapability Systems Centre, School of Engineering and Information Technology, UNSW Canberra at the Australian Defence Force Academy, Campbell, ACT, AustraliaCapability Systems Centre, School of Engineering and Information Technology, UNSW Canberra at the Australian Defence Force Academy, Campbell, ACT, AustraliaThe study of deoxyribonucleic acid (DNA) is crucial in many fields, including medicine, biology, zoology, agriculture, and forensics. Since reading a DNA sequence is onerous because of its massive length, it is common in many DNA analysis applications to divide DNA strands into small segments or fragments which, after analysis, must be reassembled. Since this reassembly takes a non-specific polynomial time to solve, the DNA fragment assembly problem (DFAP) is NP-hard. This paper proposes a new assembler for tackling the DFAP based on the overlap-layout-consensus (OLC) approach. The proposed assembler adapts a discrete whale optimization algorithm (DWOA) using standard operators adopted from evolutionary algorithms to simulate the strategy adopted by humpback whales when searching for prey. For the first time, we formulate the behaviors of whales to be applied directly to any discrete optimization problem based on three primary operations: a swap-based best-position operator, an ordered crossover operator, and selection of a random whale operation to perform the exploitation and exploration phases of the algorithm. These operations were carefully designed to preserve the methodology of the original whale algorithm. DFAP is a multi-objective problem that seeks to reach the optimal order of segments that maximizes the overlap score and minimizes the number of contigs (set of overlapping DNA segments) to compose a one-contig DNA strand. Existing local search methods, such as problem aware local search (PALS) many non-conflicting movements (PALS2-many), suffer from being trapped in local optima. Hence, the integration of DWOA with PALS2-many improves the search capability for finding the optimal order of fragments. In addition, we propose a new variation of PALS2-many that achieves simultaneously the two objectives of DFAP. Our proposed DWOA was compared with a number of the most recent robust assemblers: a hybrid crow search algorithm for solving the DFAP (CSA-P2M*Fit), P2M*Fit, and a hybrid genetic algorithm (GA-P2M*Fit). The experimental results and statistical analyses of the proposed DWOA on thirty benchmark instances show that DWOA significantly outperforms those algorithms in reaching fewer contigs, in addition to being competitive with CSA-P2M*Fit and superior to P2M*Fit and GA-P2M*Fit for the overlap score.https://ieeexplore.ieee.org/document/9293303/DNA sequenceDNA fragments assembly problemoverlap-layout-consensuswhale optimization algorithm
collection DOAJ
language English
format Article
sources DOAJ
author Mohamed Abdel-Basset
Reda Mohamed
Karam M. Sallam
Ripon K. Chakrabortty
Michael J. Ryan
spellingShingle Mohamed Abdel-Basset
Reda Mohamed
Karam M. Sallam
Ripon K. Chakrabortty
Michael J. Ryan
An Efficient-Assembler Whale Optimization Algorithm for DNA Fragment Assembly Problem: Analysis and Validations
IEEE Access
DNA sequence
DNA fragments assembly problem
overlap-layout-consensus
whale optimization algorithm
author_facet Mohamed Abdel-Basset
Reda Mohamed
Karam M. Sallam
Ripon K. Chakrabortty
Michael J. Ryan
author_sort Mohamed Abdel-Basset
title An Efficient-Assembler Whale Optimization Algorithm for DNA Fragment Assembly Problem: Analysis and Validations
title_short An Efficient-Assembler Whale Optimization Algorithm for DNA Fragment Assembly Problem: Analysis and Validations
title_full An Efficient-Assembler Whale Optimization Algorithm for DNA Fragment Assembly Problem: Analysis and Validations
title_fullStr An Efficient-Assembler Whale Optimization Algorithm for DNA Fragment Assembly Problem: Analysis and Validations
title_full_unstemmed An Efficient-Assembler Whale Optimization Algorithm for DNA Fragment Assembly Problem: Analysis and Validations
title_sort efficient-assembler whale optimization algorithm for dna fragment assembly problem: analysis and validations
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description The study of deoxyribonucleic acid (DNA) is crucial in many fields, including medicine, biology, zoology, agriculture, and forensics. Since reading a DNA sequence is onerous because of its massive length, it is common in many DNA analysis applications to divide DNA strands into small segments or fragments which, after analysis, must be reassembled. Since this reassembly takes a non-specific polynomial time to solve, the DNA fragment assembly problem (DFAP) is NP-hard. This paper proposes a new assembler for tackling the DFAP based on the overlap-layout-consensus (OLC) approach. The proposed assembler adapts a discrete whale optimization algorithm (DWOA) using standard operators adopted from evolutionary algorithms to simulate the strategy adopted by humpback whales when searching for prey. For the first time, we formulate the behaviors of whales to be applied directly to any discrete optimization problem based on three primary operations: a swap-based best-position operator, an ordered crossover operator, and selection of a random whale operation to perform the exploitation and exploration phases of the algorithm. These operations were carefully designed to preserve the methodology of the original whale algorithm. DFAP is a multi-objective problem that seeks to reach the optimal order of segments that maximizes the overlap score and minimizes the number of contigs (set of overlapping DNA segments) to compose a one-contig DNA strand. Existing local search methods, such as problem aware local search (PALS) many non-conflicting movements (PALS2-many), suffer from being trapped in local optima. Hence, the integration of DWOA with PALS2-many improves the search capability for finding the optimal order of fragments. In addition, we propose a new variation of PALS2-many that achieves simultaneously the two objectives of DFAP. Our proposed DWOA was compared with a number of the most recent robust assemblers: a hybrid crow search algorithm for solving the DFAP (CSA-P2M*Fit), P2M*Fit, and a hybrid genetic algorithm (GA-P2M*Fit). The experimental results and statistical analyses of the proposed DWOA on thirty benchmark instances show that DWOA significantly outperforms those algorithms in reaching fewer contigs, in addition to being competitive with CSA-P2M*Fit and superior to P2M*Fit and GA-P2M*Fit for the overlap score.
topic DNA sequence
DNA fragments assembly problem
overlap-layout-consensus
whale optimization algorithm
url https://ieeexplore.ieee.org/document/9293303/
work_keys_str_mv AT mohamedabdelbasset anefficientassemblerwhaleoptimizationalgorithmfordnafragmentassemblyproblemanalysisandvalidations
AT redamohamed anefficientassemblerwhaleoptimizationalgorithmfordnafragmentassemblyproblemanalysisandvalidations
AT karammsallam anefficientassemblerwhaleoptimizationalgorithmfordnafragmentassemblyproblemanalysisandvalidations
AT riponkchakrabortty anefficientassemblerwhaleoptimizationalgorithmfordnafragmentassemblyproblemanalysisandvalidations
AT michaeljryan anefficientassemblerwhaleoptimizationalgorithmfordnafragmentassemblyproblemanalysisandvalidations
AT mohamedabdelbasset efficientassemblerwhaleoptimizationalgorithmfordnafragmentassemblyproblemanalysisandvalidations
AT redamohamed efficientassemblerwhaleoptimizationalgorithmfordnafragmentassemblyproblemanalysisandvalidations
AT karammsallam efficientassemblerwhaleoptimizationalgorithmfordnafragmentassemblyproblemanalysisandvalidations
AT riponkchakrabortty efficientassemblerwhaleoptimizationalgorithmfordnafragmentassemblyproblemanalysisandvalidations
AT michaeljryan efficientassemblerwhaleoptimizationalgorithmfordnafragmentassemblyproblemanalysisandvalidations
_version_ 1724181679492300800