Ultraplex: A rapid, flexible, all-in-one fastq demultiplexer [version 1; peer review: 2 approved]

Background: The first step of virtually all next generation sequencing analysis involves the splitting of the raw sequencing data into separate files using sample-specific barcodes, a process known as “demultiplexing”. However, we found that existing software for this purpose was either too inflexib...

Full description

Bibliographic Details
Main Authors:	Oscar G Wilkins, Charlotte Capitanchik, Nicholas M. Luscombe, Jernej Ule
Format:	Article
Language:	English
Published:	Wellcome 2021-06-01
Series:	Wellcome Open Research
Online Access:	https://wellcomeopenresearch.org/articles/6-141/v1

id	doaj-38169a00edf141d49a888e774d21ac9a
record_format	Article
spelling	doaj-38169a00edf141d49a888e774d21ac9a2021-07-19T09:30:09ZengWellcomeWellcome Open Research2398-502X2021-06-01610.12688/wellcomeopenres.16791.118522Ultraplex: A rapid, flexible, all-in-one fastq demultiplexer [version 1; peer review: 2 approved]Oscar G Wilkins0Charlotte Capitanchik1Nicholas M. Luscombe2Jernej Ule3The Francis Crick Institute, London, UKThe Francis Crick Institute, London, UKOkinawa Institute of Science & Technology Graduate University, Okinawa, JapanThe Francis Crick Institute, London, UKBackground: The first step of virtually all next generation sequencing analysis involves the splitting of the raw sequencing data into separate files using sample-specific barcodes, a process known as “demultiplexing”. However, we found that existing software for this purpose was either too inflexible or too computationally intensive for fast, streamlined processing of raw, single end fastq files containing combinatorial barcodes. Results: Here, we introduce a fast and uniquely flexible demultiplexer, named Ultraplex, which splits a raw FASTQ file containing barcodes either at a single end or at both 5’ and 3’ ends of reads, trims the sequencing adaptors and low-quality bases, and moves unique molecular identifiers (UMIs) into the read header, allowing subsequent removal of PCR duplicates. Ultraplex is able to perform such single or combinatorial demultiplexing on both single- and paired-end sequencing data, and can process an entire Illumina HiSeq lane, consisting of nearly 500 million reads, in less than 20 minutes. Conclusions: Ultraplex greatly reduces computational burden and pipeline complexity for the demultiplexing of complex sequencing libraries, such as those produced by various CLIP and ribosome profiling protocols, and is also very user friendly, enabling streamlined, robust data processing. Ultraplex is available on PyPi and Conda and via Github.https://wellcomeopenresearch.org/articles/6-141/v1
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Oscar G Wilkins Charlotte Capitanchik Nicholas M. Luscombe Jernej Ule
spellingShingle	Oscar G Wilkins Charlotte Capitanchik Nicholas M. Luscombe Jernej Ule Ultraplex: A rapid, flexible, all-in-one fastq demultiplexer [version 1; peer review: 2 approved] Wellcome Open Research
author_facet	Oscar G Wilkins Charlotte Capitanchik Nicholas M. Luscombe Jernej Ule
author_sort	Oscar G Wilkins
title	Ultraplex: A rapid, flexible, all-in-one fastq demultiplexer [version 1; peer review: 2 approved]
title_short	Ultraplex: A rapid, flexible, all-in-one fastq demultiplexer [version 1; peer review: 2 approved]
title_full	Ultraplex: A rapid, flexible, all-in-one fastq demultiplexer [version 1; peer review: 2 approved]
title_fullStr	Ultraplex: A rapid, flexible, all-in-one fastq demultiplexer [version 1; peer review: 2 approved]
title_full_unstemmed	Ultraplex: A rapid, flexible, all-in-one fastq demultiplexer [version 1; peer review: 2 approved]
title_sort	ultraplex: a rapid, flexible, all-in-one fastq demultiplexer [version 1; peer review: 2 approved]
publisher	Wellcome
series	Wellcome Open Research
issn	2398-502X
publishDate	2021-06-01
description	Background: The first step of virtually all next generation sequencing analysis involves the splitting of the raw sequencing data into separate files using sample-specific barcodes, a process known as “demultiplexing”. However, we found that existing software for this purpose was either too inflexible or too computationally intensive for fast, streamlined processing of raw, single end fastq files containing combinatorial barcodes. Results: Here, we introduce a fast and uniquely flexible demultiplexer, named Ultraplex, which splits a raw FASTQ file containing barcodes either at a single end or at both 5’ and 3’ ends of reads, trims the sequencing adaptors and low-quality bases, and moves unique molecular identifiers (UMIs) into the read header, allowing subsequent removal of PCR duplicates. Ultraplex is able to perform such single or combinatorial demultiplexing on both single- and paired-end sequencing data, and can process an entire Illumina HiSeq lane, consisting of nearly 500 million reads, in less than 20 minutes. Conclusions: Ultraplex greatly reduces computational burden and pipeline complexity for the demultiplexing of complex sequencing libraries, such as those produced by various CLIP and ribosome profiling protocols, and is also very user friendly, enabling streamlined, robust data processing. Ultraplex is available on PyPi and Conda and via Github.
url	https://wellcomeopenresearch.org/articles/6-141/v1
work_keys_str_mv	AT oscargwilkins ultraplexarapidflexibleallinonefastqdemultiplexerversion1peerreview2approved AT charlottecapitanchik ultraplexarapidflexibleallinonefastqdemultiplexerversion1peerreview2approved AT nicholasmluscombe ultraplexarapidflexibleallinonefastqdemultiplexerversion1peerreview2approved AT jernejule ultraplexarapidflexibleallinonefastqdemultiplexerversion1peerreview2approved
_version_	1721295072995049472

Ultraplex: A rapid, flexible, all-in-one fastq demultiplexer [version 1; peer review: 2 approved]

Similar Items