CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning
Abstract Background The latest works on CRISPR genome editing tools mainly employs deep learning techniques. However, deep learning models lack explainability and they are harder to reproduce. We were motivated to build an accurate genome editing tool using sequence-based features and traditional ma...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2020-06-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-020-3531-9 |
id |
doaj-1179e5fca13a44afae98e5775618ff5b |
---|---|
record_format |
Article |
spelling |
doaj-1179e5fca13a44afae98e5775618ff5b2020-11-25T03:29:46ZengBMCBMC Bioinformatics1471-21052020-06-0121111310.1186/s12859-020-3531-9CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learningAli Haisam Muhammad Rafid0Md. Toufikuzzaman1Mohammad Saifur Rahman2M. Sohel Rahman3Department of Computer Science and Engineering, Bangladesh University of Engineering and TechnologyDepartment of Computer Science and Engineering, Bangladesh University of Engineering and TechnologyDepartment of Computer Science and Engineering, Bangladesh University of Engineering and TechnologyDepartment of Computer Science and Engineering, Bangladesh University of Engineering and TechnologyAbstract Background The latest works on CRISPR genome editing tools mainly employs deep learning techniques. However, deep learning models lack explainability and they are harder to reproduce. We were motivated to build an accurate genome editing tool using sequence-based features and traditional machine learning that can compete with deep learning models. Results In this paper, we present CRISPRpred(SEQ), a method for sgRNA on-target activity prediction that leverages only traditional machine learning techniques and hand-crafted features extracted from sgRNA sequences. We compare the results of CRISPRpred(SEQ) with that of DeepCRISPR, the current state-of-the-art, which uses a deep learning pipeline. Despite using only traditional machine learning methods, we have been able to beat DeepCRISPR for the three out of four cell lines in the benchmark dataset convincingly (2.174%, 6.905% and 8.119% improvement for the three cell lines). Conclusion CRISPRpred(SEQ) has been able to convincingly beat DeepCRISPR in 3 out of 4 cell lines. We believe that by exploring further, one can design better features only using the sgRNA sequences and can come up with a better method leveraging only traditional machine learning algorithms that can fully beat the deep learning models.http://link.springer.com/article/10.1186/s12859-020-3531-9CRISPRsgRNAMachine learningDeep learningCas9 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Ali Haisam Muhammad Rafid Md. Toufikuzzaman Mohammad Saifur Rahman M. Sohel Rahman |
spellingShingle |
Ali Haisam Muhammad Rafid Md. Toufikuzzaman Mohammad Saifur Rahman M. Sohel Rahman CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning BMC Bioinformatics CRISPR sgRNA Machine learning Deep learning Cas9 |
author_facet |
Ali Haisam Muhammad Rafid Md. Toufikuzzaman Mohammad Saifur Rahman M. Sohel Rahman |
author_sort |
Ali Haisam Muhammad Rafid |
title |
CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning |
title_short |
CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning |
title_full |
CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning |
title_fullStr |
CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning |
title_full_unstemmed |
CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning |
title_sort |
crisprpred(seq): a sequence-based method for sgrna on target activity prediction using traditional machine learning |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2020-06-01 |
description |
Abstract Background The latest works on CRISPR genome editing tools mainly employs deep learning techniques. However, deep learning models lack explainability and they are harder to reproduce. We were motivated to build an accurate genome editing tool using sequence-based features and traditional machine learning that can compete with deep learning models. Results In this paper, we present CRISPRpred(SEQ), a method for sgRNA on-target activity prediction that leverages only traditional machine learning techniques and hand-crafted features extracted from sgRNA sequences. We compare the results of CRISPRpred(SEQ) with that of DeepCRISPR, the current state-of-the-art, which uses a deep learning pipeline. Despite using only traditional machine learning methods, we have been able to beat DeepCRISPR for the three out of four cell lines in the benchmark dataset convincingly (2.174%, 6.905% and 8.119% improvement for the three cell lines). Conclusion CRISPRpred(SEQ) has been able to convincingly beat DeepCRISPR in 3 out of 4 cell lines. We believe that by exploring further, one can design better features only using the sgRNA sequences and can come up with a better method leveraging only traditional machine learning algorithms that can fully beat the deep learning models. |
topic |
CRISPR sgRNA Machine learning Deep learning Cas9 |
url |
http://link.springer.com/article/10.1186/s12859-020-3531-9 |
work_keys_str_mv |
AT alihaisammuhammadrafid crisprpredseqasequencebasedmethodforsgrnaontargetactivitypredictionusingtraditionalmachinelearning AT mdtoufikuzzaman crisprpredseqasequencebasedmethodforsgrnaontargetactivitypredictionusingtraditionalmachinelearning AT mohammadsaifurrahman crisprpredseqasequencebasedmethodforsgrnaontargetactivitypredictionusingtraditionalmachinelearning AT msohelrahman crisprpredseqasequencebasedmethodforsgrnaontargetactivitypredictionusingtraditionalmachinelearning |
_version_ |
1724577244373843968 |