Paired-end small RNA sequencing reveals a possible overestimation in the isomiR sequence repertoire previously reported from conventional single read data analysis

Abstract Background Next generation sequencing has allowed the discovery of miRNA isoforms, termed isomiRs. Some isomiRs are derived from imprecise processing of pre-miRNA precursors, leading to length variants. Additional variability is introduced by non-templated addition of bases at the ends or e...

Full description

Bibliographic Details
Main Authors: Jose Francisco Sanchez Herrero, Raquel Pluvinet, Antonio Luna de Haro, Lauro Sumoy
Format: Article
Language:English
Published: BMC 2021-04-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-021-04128-1
id doaj-6943c4994f754155964bea38112780d5
record_format Article
spelling doaj-6943c4994f754155964bea38112780d52021-05-02T11:49:37ZengBMCBMC Bioinformatics1471-21052021-04-0122111610.1186/s12859-021-04128-1Paired-end small RNA sequencing reveals a possible overestimation in the isomiR sequence repertoire previously reported from conventional single read data analysisJose Francisco Sanchez Herrero0Raquel Pluvinet1Antonio Luna de Haro2Lauro Sumoy3Institut Germans Trias i Pujol (IGTP)Institut Germans Trias i Pujol (IGTP)Institut Germans Trias i Pujol (IGTP)Institut Germans Trias i Pujol (IGTP)Abstract Background Next generation sequencing has allowed the discovery of miRNA isoforms, termed isomiRs. Some isomiRs are derived from imprecise processing of pre-miRNA precursors, leading to length variants. Additional variability is introduced by non-templated addition of bases at the ends or editing of internal bases, resulting in base differences relative to the template DNA sequence. We hypothesized that some component of the isomiR variation reported so far could be due to systematic technical noise and not real. Results We have developed the XICRA pipeline to analyze small RNA sequencing data at the isomiR level. We exploited its ability to use single or merged reads to compare isomiR results derived from paired-end (PE) reads with those from single reads (SR) to address whether detectable sequence differences relative to canonical miRNAs found in isomiRs are true biological variations or the result of errors in sequencing. We have detected non-negligible systematic differences between SR and PE data which primarily affect putative internally edited isomiRs, and at a much smaller frequency terminal length changing isomiRs. This is relevant for the identification of true isomiRs in small RNA sequencing datasets. Conclusions We conclude that potential artifacts derived from sequencing errors and/or data processing could result in an overestimation of abundance and diversity of miRNA isoforms. Efforts in annotating the isomiRnome should take this into account.https://doi.org/10.1186/s12859-021-04128-1miRNAIsomiRPaired-end sequencing
collection DOAJ
language English
format Article
sources DOAJ
author Jose Francisco Sanchez Herrero
Raquel Pluvinet
Antonio Luna de Haro
Lauro Sumoy
spellingShingle Jose Francisco Sanchez Herrero
Raquel Pluvinet
Antonio Luna de Haro
Lauro Sumoy
Paired-end small RNA sequencing reveals a possible overestimation in the isomiR sequence repertoire previously reported from conventional single read data analysis
BMC Bioinformatics
miRNA
IsomiR
Paired-end sequencing
author_facet Jose Francisco Sanchez Herrero
Raquel Pluvinet
Antonio Luna de Haro
Lauro Sumoy
author_sort Jose Francisco Sanchez Herrero
title Paired-end small RNA sequencing reveals a possible overestimation in the isomiR sequence repertoire previously reported from conventional single read data analysis
title_short Paired-end small RNA sequencing reveals a possible overestimation in the isomiR sequence repertoire previously reported from conventional single read data analysis
title_full Paired-end small RNA sequencing reveals a possible overestimation in the isomiR sequence repertoire previously reported from conventional single read data analysis
title_fullStr Paired-end small RNA sequencing reveals a possible overestimation in the isomiR sequence repertoire previously reported from conventional single read data analysis
title_full_unstemmed Paired-end small RNA sequencing reveals a possible overestimation in the isomiR sequence repertoire previously reported from conventional single read data analysis
title_sort paired-end small rna sequencing reveals a possible overestimation in the isomir sequence repertoire previously reported from conventional single read data analysis
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2021-04-01
description Abstract Background Next generation sequencing has allowed the discovery of miRNA isoforms, termed isomiRs. Some isomiRs are derived from imprecise processing of pre-miRNA precursors, leading to length variants. Additional variability is introduced by non-templated addition of bases at the ends or editing of internal bases, resulting in base differences relative to the template DNA sequence. We hypothesized that some component of the isomiR variation reported so far could be due to systematic technical noise and not real. Results We have developed the XICRA pipeline to analyze small RNA sequencing data at the isomiR level. We exploited its ability to use single or merged reads to compare isomiR results derived from paired-end (PE) reads with those from single reads (SR) to address whether detectable sequence differences relative to canonical miRNAs found in isomiRs are true biological variations or the result of errors in sequencing. We have detected non-negligible systematic differences between SR and PE data which primarily affect putative internally edited isomiRs, and at a much smaller frequency terminal length changing isomiRs. This is relevant for the identification of true isomiRs in small RNA sequencing datasets. Conclusions We conclude that potential artifacts derived from sequencing errors and/or data processing could result in an overestimation of abundance and diversity of miRNA isoforms. Efforts in annotating the isomiRnome should take this into account.
topic miRNA
IsomiR
Paired-end sequencing
url https://doi.org/10.1186/s12859-021-04128-1
work_keys_str_mv AT josefranciscosanchezherrero pairedendsmallrnasequencingrevealsapossibleoverestimationintheisomirsequencerepertoirepreviouslyreportedfromconventionalsinglereaddataanalysis
AT raquelpluvinet pairedendsmallrnasequencingrevealsapossibleoverestimationintheisomirsequencerepertoirepreviouslyreportedfromconventionalsinglereaddataanalysis
AT antoniolunadeharo pairedendsmallrnasequencingrevealsapossibleoverestimationintheisomirsequencerepertoirepreviouslyreportedfromconventionalsinglereaddataanalysis
AT laurosumoy pairedendsmallrnasequencingrevealsapossibleoverestimationintheisomirsequencerepertoirepreviouslyreportedfromconventionalsinglereaddataanalysis
_version_ 1721491660037160960