PB-Motif—A Method for Identifying Gene/Pseudogene Rearrangements With Long Reads: An Application to CYP21A2 Genotyping

Long read sequencing technologies have the potential to accurately detect and phase variation in genomic regions that are difficult to fully characterize with conventional short read methods. These difficult to sequence regions include several clinically relevant genes with highly homologous pseudog...

Full description

Bibliographic Details
Main Authors: Zachary Stephens, Dragana Milosevic, Benjamin Kipp, Stefan Grebe, Ravishankar K. Iyer, Jean-Pierre A. Kocher
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-07-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2021.716586/full
id doaj-b11181a9f8864e89816d3eb364e98c24
record_format Article
spelling doaj-b11181a9f8864e89816d3eb364e98c242021-07-28T09:54:06ZengFrontiers Media S.A.Frontiers in Genetics1664-80212021-07-011210.3389/fgene.2021.716586716586PB-Motif—A Method for Identifying Gene/Pseudogene Rearrangements With Long Reads: An Application to CYP21A2 GenotypingZachary Stephens0Dragana Milosevic1Benjamin Kipp2Stefan Grebe3Ravishankar K. Iyer4Jean-Pierre A. Kocher5Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, IL, United StatesMayo Clinic, Rochester, MN, United StatesMayo Clinic, Rochester, MN, United StatesMayo Clinic, Rochester, MN, United StatesDepartment of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, IL, United StatesMayo Clinic, Rochester, MN, United StatesLong read sequencing technologies have the potential to accurately detect and phase variation in genomic regions that are difficult to fully characterize with conventional short read methods. These difficult to sequence regions include several clinically relevant genes with highly homologous pseudogenes, many of which are prone to gene conversions or other types of complex structural rearrangements. We present PB-Motif, a new method for identifying rearrangements between two highly homologous genomic regions using PacBio long reads. PB-Motif leverages clustering and filtering techniques to efficiently report rearrangements in the presence of sequencing errors and other systematic artifacts. Supporting reads for each high-confidence rearrangement can then be used for copy number estimation and phased variant calling. First, we demonstrate PB-Motif's accuracy with simulated sequence rearrangements of PMS2 and its pseudogene PMS2CL using simulated reads sweeping over a range of sequencing error rates. We then apply PB-Motif to 26 clinical samples, characterizing CYP21A2 and its pseudogene CYP21A1P as part of a diagnostic assay for congenital adrenal hyperplasia. We successfully identify damaging variation and patient carrier status concordant with clinical diagnosis obtained from multiplex ligation-dependent amplification (MLPA) and Sanger sequencing. The source code is available at: github.com/zstephens/pb-motif.https://www.frontiersin.org/articles/10.3389/fgene.2021.716586/fulllong readspseudogenestructural variationcongenital adrenal hyperplasiaCYP21A2bioinformatics
collection DOAJ
language English
format Article
sources DOAJ
author Zachary Stephens
Dragana Milosevic
Benjamin Kipp
Stefan Grebe
Ravishankar K. Iyer
Jean-Pierre A. Kocher
spellingShingle Zachary Stephens
Dragana Milosevic
Benjamin Kipp
Stefan Grebe
Ravishankar K. Iyer
Jean-Pierre A. Kocher
PB-Motif—A Method for Identifying Gene/Pseudogene Rearrangements With Long Reads: An Application to CYP21A2 Genotyping
Frontiers in Genetics
long reads
pseudogene
structural variation
congenital adrenal hyperplasia
CYP21A2
bioinformatics
author_facet Zachary Stephens
Dragana Milosevic
Benjamin Kipp
Stefan Grebe
Ravishankar K. Iyer
Jean-Pierre A. Kocher
author_sort Zachary Stephens
title PB-Motif—A Method for Identifying Gene/Pseudogene Rearrangements With Long Reads: An Application to CYP21A2 Genotyping
title_short PB-Motif—A Method for Identifying Gene/Pseudogene Rearrangements With Long Reads: An Application to CYP21A2 Genotyping
title_full PB-Motif—A Method for Identifying Gene/Pseudogene Rearrangements With Long Reads: An Application to CYP21A2 Genotyping
title_fullStr PB-Motif—A Method for Identifying Gene/Pseudogene Rearrangements With Long Reads: An Application to CYP21A2 Genotyping
title_full_unstemmed PB-Motif—A Method for Identifying Gene/Pseudogene Rearrangements With Long Reads: An Application to CYP21A2 Genotyping
title_sort pb-motif—a method for identifying gene/pseudogene rearrangements with long reads: an application to cyp21a2 genotyping
publisher Frontiers Media S.A.
series Frontiers in Genetics
issn 1664-8021
publishDate 2021-07-01
description Long read sequencing technologies have the potential to accurately detect and phase variation in genomic regions that are difficult to fully characterize with conventional short read methods. These difficult to sequence regions include several clinically relevant genes with highly homologous pseudogenes, many of which are prone to gene conversions or other types of complex structural rearrangements. We present PB-Motif, a new method for identifying rearrangements between two highly homologous genomic regions using PacBio long reads. PB-Motif leverages clustering and filtering techniques to efficiently report rearrangements in the presence of sequencing errors and other systematic artifacts. Supporting reads for each high-confidence rearrangement can then be used for copy number estimation and phased variant calling. First, we demonstrate PB-Motif's accuracy with simulated sequence rearrangements of PMS2 and its pseudogene PMS2CL using simulated reads sweeping over a range of sequencing error rates. We then apply PB-Motif to 26 clinical samples, characterizing CYP21A2 and its pseudogene CYP21A1P as part of a diagnostic assay for congenital adrenal hyperplasia. We successfully identify damaging variation and patient carrier status concordant with clinical diagnosis obtained from multiplex ligation-dependent amplification (MLPA) and Sanger sequencing. The source code is available at: github.com/zstephens/pb-motif.
topic long reads
pseudogene
structural variation
congenital adrenal hyperplasia
CYP21A2
bioinformatics
url https://www.frontiersin.org/articles/10.3389/fgene.2021.716586/full
work_keys_str_mv AT zacharystephens pbmotifamethodforidentifyinggenepseudogenerearrangementswithlongreadsanapplicationtocyp21a2genotyping
AT draganamilosevic pbmotifamethodforidentifyinggenepseudogenerearrangementswithlongreadsanapplicationtocyp21a2genotyping
AT benjaminkipp pbmotifamethodforidentifyinggenepseudogenerearrangementswithlongreadsanapplicationtocyp21a2genotyping
AT stefangrebe pbmotifamethodforidentifyinggenepseudogenerearrangementswithlongreadsanapplicationtocyp21a2genotyping
AT ravishankarkiyer pbmotifamethodforidentifyinggenepseudogenerearrangementswithlongreadsanapplicationtocyp21a2genotyping
AT jeanpierreakocher pbmotifamethodforidentifyinggenepseudogenerearrangementswithlongreadsanapplicationtocyp21a2genotyping
_version_ 1721278828474531840