CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning

Abstract Background The latest works on CRISPR genome editing tools mainly employs deep learning techniques. However, deep learning models lack explainability and they are harder to reproduce. We were motivated to build an accurate genome editing tool using sequence-based features and traditional ma...

Full description

Bibliographic Details
Main Authors: Ali Haisam Muhammad Rafid, Md. Toufikuzzaman, Mohammad Saifur Rahman, M. Sohel Rahman
Format: Article
Language:English
Published: BMC 2020-06-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-020-3531-9
id doaj-1179e5fca13a44afae98e5775618ff5b
record_format Article
spelling doaj-1179e5fca13a44afae98e5775618ff5b2020-11-25T03:29:46ZengBMCBMC Bioinformatics1471-21052020-06-0121111310.1186/s12859-020-3531-9CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learningAli Haisam Muhammad Rafid0Md. Toufikuzzaman1Mohammad Saifur Rahman2M. Sohel Rahman3Department of Computer Science and Engineering, Bangladesh University of Engineering and TechnologyDepartment of Computer Science and Engineering, Bangladesh University of Engineering and TechnologyDepartment of Computer Science and Engineering, Bangladesh University of Engineering and TechnologyDepartment of Computer Science and Engineering, Bangladesh University of Engineering and TechnologyAbstract Background The latest works on CRISPR genome editing tools mainly employs deep learning techniques. However, deep learning models lack explainability and they are harder to reproduce. We were motivated to build an accurate genome editing tool using sequence-based features and traditional machine learning that can compete with deep learning models. Results In this paper, we present CRISPRpred(SEQ), a method for sgRNA on-target activity prediction that leverages only traditional machine learning techniques and hand-crafted features extracted from sgRNA sequences. We compare the results of CRISPRpred(SEQ) with that of DeepCRISPR, the current state-of-the-art, which uses a deep learning pipeline. Despite using only traditional machine learning methods, we have been able to beat DeepCRISPR for the three out of four cell lines in the benchmark dataset convincingly (2.174%, 6.905% and 8.119% improvement for the three cell lines). Conclusion CRISPRpred(SEQ) has been able to convincingly beat DeepCRISPR in 3 out of 4 cell lines. We believe that by exploring further, one can design better features only using the sgRNA sequences and can come up with a better method leveraging only traditional machine learning algorithms that can fully beat the deep learning models.http://link.springer.com/article/10.1186/s12859-020-3531-9CRISPRsgRNAMachine learningDeep learningCas9
collection DOAJ
language English
format Article
sources DOAJ
author Ali Haisam Muhammad Rafid
Md. Toufikuzzaman
Mohammad Saifur Rahman
M. Sohel Rahman
spellingShingle Ali Haisam Muhammad Rafid
Md. Toufikuzzaman
Mohammad Saifur Rahman
M. Sohel Rahman
CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning
BMC Bioinformatics
CRISPR
sgRNA
Machine learning
Deep learning
Cas9
author_facet Ali Haisam Muhammad Rafid
Md. Toufikuzzaman
Mohammad Saifur Rahman
M. Sohel Rahman
author_sort Ali Haisam Muhammad Rafid
title CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning
title_short CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning
title_full CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning
title_fullStr CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning
title_full_unstemmed CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning
title_sort crisprpred(seq): a sequence-based method for sgrna on target activity prediction using traditional machine learning
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2020-06-01
description Abstract Background The latest works on CRISPR genome editing tools mainly employs deep learning techniques. However, deep learning models lack explainability and they are harder to reproduce. We were motivated to build an accurate genome editing tool using sequence-based features and traditional machine learning that can compete with deep learning models. Results In this paper, we present CRISPRpred(SEQ), a method for sgRNA on-target activity prediction that leverages only traditional machine learning techniques and hand-crafted features extracted from sgRNA sequences. We compare the results of CRISPRpred(SEQ) with that of DeepCRISPR, the current state-of-the-art, which uses a deep learning pipeline. Despite using only traditional machine learning methods, we have been able to beat DeepCRISPR for the three out of four cell lines in the benchmark dataset convincingly (2.174%, 6.905% and 8.119% improvement for the three cell lines). Conclusion CRISPRpred(SEQ) has been able to convincingly beat DeepCRISPR in 3 out of 4 cell lines. We believe that by exploring further, one can design better features only using the sgRNA sequences and can come up with a better method leveraging only traditional machine learning algorithms that can fully beat the deep learning models.
topic CRISPR
sgRNA
Machine learning
Deep learning
Cas9
url http://link.springer.com/article/10.1186/s12859-020-3531-9
work_keys_str_mv AT alihaisammuhammadrafid crisprpredseqasequencebasedmethodforsgrnaontargetactivitypredictionusingtraditionalmachinelearning
AT mdtoufikuzzaman crisprpredseqasequencebasedmethodforsgrnaontargetactivitypredictionusingtraditionalmachinelearning
AT mohammadsaifurrahman crisprpredseqasequencebasedmethodforsgrnaontargetactivitypredictionusingtraditionalmachinelearning
AT msohelrahman crisprpredseqasequencebasedmethodforsgrnaontargetactivitypredictionusingtraditionalmachinelearning
_version_ 1724577244373843968