Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly

Centromeric alpha satellite (AS) is composed of highly identical higher-order DNA repetitive sequences, which make the standard assembly process impossible. Because of this the AS repeats were severely underrepresented in previous versions of the human genome assembly showing large centromeric gaps....

Full description

Bibliographic Details
Main Authors: V.A. Shepelev, L.I. Uralsky, A.A. Alexandrov, Y.B. Yurov, E.I. Rogaev, I.A. Alexandrov
Format: Article
Language:English
Published: Elsevier 2015-09-01
Series:Genomics Data
Online Access:http://www.sciencedirect.com/science/article/pii/S2213596015000987
id doaj-d1078923170b4fcebe0e4fceeb66870b
record_format Article
spelling doaj-d1078923170b4fcebe0e4fceeb66870b2020-11-24T21:51:17ZengElsevierGenomics Data2213-59602015-09-015C13914610.1016/j.gdata.2015.05.035Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assemblyV.A. Shepelev0L.I. Uralsky1A.A. Alexandrov2Y.B. Yurov3E.I. Rogaev4I.A. Alexandrov5Institute of Molecular Genetics, Russian Academy of Sciences, Kurchatov sq. 2, Moscow 123182, RussiaInstitute of Molecular Genetics, Russian Academy of Sciences, Kurchatov sq. 2, Moscow 123182, RussiaInstitute of Molecular Genetics, Russian Academy of Sciences, Kurchatov sq. 2, Moscow 123182, RussiaResearch Center of Mental Health, Russian Academy of Medical Sciences, Zagorodnoe sh. 2, Moscow 113152, RussiaDepartment of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, RussiaResearch Center of Mental Health, Russian Academy of Medical Sciences, Zagorodnoe sh. 2, Moscow 113152, RussiaCentromeric alpha satellite (AS) is composed of highly identical higher-order DNA repetitive sequences, which make the standard assembly process impossible. Because of this the AS repeats were severely underrepresented in previous versions of the human genome assembly showing large centromeric gaps. The latest hg38 assembly (GCA_000001405.15) employed a novel method of approximate representation of these sequences using AS reference models to fill the gaps. Therefore, a lot more of assembled AS became available for genomic analysis. We used the PERCON program previously described by us to annotate various suprachromosomal families (SFs) of AS in the hg38 assembly and presented the results of our primary analysis as an easy-to-read track for the UCSC Genome Browser. The monomeric classes, characteristic of the five known SFs, were color-coded, which allowed quick visual assessment of AS composition in whole multi-megabase centromeres down to each individual AS monomer. Such comprehensive annotation of AS in the human genome assembly was performed for the first time. It showed the expected prevalence of the known major types of AS organization characteristic of the five established SFs. Also, some less common types of AS arrays were identified, such as pure R2 domains in SF5, apparent J/R and D/R mixes in SF1 and SF2, and several different SF4 higher-order repeats among reference models and in regular contigs. No new SFs or large unclassed AS domains were discovered. The dataset reveals the architecture of human centromeres and allows classification of AS sequence reads by alignment to the annotated hg38 assembly. The data were deposited here: http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&hgt.customText=https://dl.dropboxusercontent.com/u/22994534/AS-tracks/human-GRC-hg38-M1SFs.bed.bz2.http://www.sciencedirect.com/science/article/pii/S2213596015000987
collection DOAJ
language English
format Article
sources DOAJ
author V.A. Shepelev
L.I. Uralsky
A.A. Alexandrov
Y.B. Yurov
E.I. Rogaev
I.A. Alexandrov
spellingShingle V.A. Shepelev
L.I. Uralsky
A.A. Alexandrov
Y.B. Yurov
E.I. Rogaev
I.A. Alexandrov
Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly
Genomics Data
author_facet V.A. Shepelev
L.I. Uralsky
A.A. Alexandrov
Y.B. Yurov
E.I. Rogaev
I.A. Alexandrov
author_sort V.A. Shepelev
title Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly
title_short Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly
title_full Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly
title_fullStr Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly
title_full_unstemmed Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly
title_sort annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly
publisher Elsevier
series Genomics Data
issn 2213-5960
publishDate 2015-09-01
description Centromeric alpha satellite (AS) is composed of highly identical higher-order DNA repetitive sequences, which make the standard assembly process impossible. Because of this the AS repeats were severely underrepresented in previous versions of the human genome assembly showing large centromeric gaps. The latest hg38 assembly (GCA_000001405.15) employed a novel method of approximate representation of these sequences using AS reference models to fill the gaps. Therefore, a lot more of assembled AS became available for genomic analysis. We used the PERCON program previously described by us to annotate various suprachromosomal families (SFs) of AS in the hg38 assembly and presented the results of our primary analysis as an easy-to-read track for the UCSC Genome Browser. The monomeric classes, characteristic of the five known SFs, were color-coded, which allowed quick visual assessment of AS composition in whole multi-megabase centromeres down to each individual AS monomer. Such comprehensive annotation of AS in the human genome assembly was performed for the first time. It showed the expected prevalence of the known major types of AS organization characteristic of the five established SFs. Also, some less common types of AS arrays were identified, such as pure R2 domains in SF5, apparent J/R and D/R mixes in SF1 and SF2, and several different SF4 higher-order repeats among reference models and in regular contigs. No new SFs or large unclassed AS domains were discovered. The dataset reveals the architecture of human centromeres and allows classification of AS sequence reads by alignment to the annotated hg38 assembly. The data were deposited here: http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&hgt.customText=https://dl.dropboxusercontent.com/u/22994534/AS-tracks/human-GRC-hg38-M1SFs.bed.bz2.
url http://www.sciencedirect.com/science/article/pii/S2213596015000987
work_keys_str_mv AT vashepelev annotationofsuprachromosomalfamiliesrevealsuncommontypesofalphasatelliteorganizationinpericentromericregionsofhg38humangenomeassembly
AT liuralsky annotationofsuprachromosomalfamiliesrevealsuncommontypesofalphasatelliteorganizationinpericentromericregionsofhg38humangenomeassembly
AT aaalexandrov annotationofsuprachromosomalfamiliesrevealsuncommontypesofalphasatelliteorganizationinpericentromericregionsofhg38humangenomeassembly
AT ybyurov annotationofsuprachromosomalfamiliesrevealsuncommontypesofalphasatelliteorganizationinpericentromericregionsofhg38humangenomeassembly
AT eirogaev annotationofsuprachromosomalfamiliesrevealsuncommontypesofalphasatelliteorganizationinpericentromericregionsofhg38humangenomeassembly
AT iaalexandrov annotationofsuprachromosomalfamiliesrevealsuncommontypesofalphasatelliteorganizationinpericentromericregionsofhg38humangenomeassembly
_version_ 1725879339725094912