Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly

In the latest hg38 human genome assembly, centromeric gaps has been filled in by alpha satellite (AS) reference models (RMs) which are statistical representations of homogeneous higher-order repeat (HOR) arrays that make up the bulk of the centromeric regions. We analyzed these models to compose an...

Full description

Bibliographic Details
Main Authors: L.I. Uralsky, V.A. Shepelev, A.A. Alexandrov, Y.B. Yurov, E.I. Rogaev, I.A. Alexandrov
Format: Article
Language:English
Published: Elsevier 2019-06-01
Series:Data in Brief
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340919300575
id doaj-dbb96c30d2db450888eca6ecd2fdf50f
record_format Article
spelling doaj-dbb96c30d2db450888eca6ecd2fdf50f2020-11-25T02:33:19ZengElsevierData in Brief2352-34092019-06-0124Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assemblyL.I. Uralsky0V.A. Shepelev1A.A. Alexandrov2Y.B. Yurov3E.I. Rogaev4I.A. Alexandrov5Institute of Molecular Genetics, Russian Academy of Sciences, Kurchatov Sq. 2, Moscow 123182, Russia; Department of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, RussiaInstitute of Molecular Genetics, Russian Academy of Sciences, Kurchatov Sq. 2, Moscow 123182, Russia; Department of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, RussiaInstitute of Molecular Genetics, Russian Academy of Sciences, Kurchatov Sq. 2, Moscow 123182, RussiaResearch Center of Mental Health, Zagorodnoe Sh. 2, Moscow 113152, RussiaDepartment of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia; Department of Psychiatry, Brudnick Neuropsychiatric Research Institute, University of Massachusetts Medical School, Worcester, MA 01604, USA; Lomonosov Moscow State University, Biological Department, Center for Genetics and Genetic Technologies, Moscow, 119192, Russia; Corresponding authors. Department of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia.Department of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia; Research Center of Mental Health, Zagorodnoe Sh. 2, Moscow 113152, Russia; Corresponding authors. Department of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia.In the latest hg38 human genome assembly, centromeric gaps has been filled in by alpha satellite (AS) reference models (RMs) which are statistical representations of homogeneous higher-order repeat (HOR) arrays that make up the bulk of the centromeric regions. We analyzed these models to compose an atlas of human AS HORs where each monomer of a HOR was represented by a number of its polymorphic sequence variants. We combined these data and HMMER sequence analysis platform to annotate AS HORs in the assembly. This led to discovery of a new type of low copy number highly divergent HORs which were not represented by RMs. These were included in the dataset. The annotation can be viewed as UCSC Genome Browser custom track (the HOR-track) and used together with our previous annotation of AS suprachromosomal families (SFs) in the same assembly, where each AS monomer can be viewed in its genomic context together with its classification into one of the 5 major SFs (the SF-track). To catalog the diversity of AS HORs in the human genome we introduced a new naming system. Each HOR received a name which showed its SF, chromosomal location and index number. Here we present the first installment of the HOR-track covering only the 17 HORs that belong to SF1 which forms live functional centromeres in chromosomes 1, 3, 5, 6, 7, 10, 12, 16 and 19 and also a large number of minor dead HOR domains, both homogeneous and divergent. Monomer-by-monomer HOR annotation used for this dataset as opposed to annotation of whole HOR repeats provides for mapping and quantification of various structural variants of AS HORs which can be used to collect data on inter-individual polymorphism of AS. Keywords: Centromeres, Alpha satellite, Suprachromosomal families, Higher-order repeats, hg38 human genome assemblyhttp://www.sciencedirect.com/science/article/pii/S2352340919300575
collection DOAJ
language English
format Article
sources DOAJ
author L.I. Uralsky
V.A. Shepelev
A.A. Alexandrov
Y.B. Yurov
E.I. Rogaev
I.A. Alexandrov
spellingShingle L.I. Uralsky
V.A. Shepelev
A.A. Alexandrov
Y.B. Yurov
E.I. Rogaev
I.A. Alexandrov
Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly
Data in Brief
author_facet L.I. Uralsky
V.A. Shepelev
A.A. Alexandrov
Y.B. Yurov
E.I. Rogaev
I.A. Alexandrov
author_sort L.I. Uralsky
title Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly
title_short Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly
title_full Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly
title_fullStr Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly
title_full_unstemmed Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly
title_sort classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly
publisher Elsevier
series Data in Brief
issn 2352-3409
publishDate 2019-06-01
description In the latest hg38 human genome assembly, centromeric gaps has been filled in by alpha satellite (AS) reference models (RMs) which are statistical representations of homogeneous higher-order repeat (HOR) arrays that make up the bulk of the centromeric regions. We analyzed these models to compose an atlas of human AS HORs where each monomer of a HOR was represented by a number of its polymorphic sequence variants. We combined these data and HMMER sequence analysis platform to annotate AS HORs in the assembly. This led to discovery of a new type of low copy number highly divergent HORs which were not represented by RMs. These were included in the dataset. The annotation can be viewed as UCSC Genome Browser custom track (the HOR-track) and used together with our previous annotation of AS suprachromosomal families (SFs) in the same assembly, where each AS monomer can be viewed in its genomic context together with its classification into one of the 5 major SFs (the SF-track). To catalog the diversity of AS HORs in the human genome we introduced a new naming system. Each HOR received a name which showed its SF, chromosomal location and index number. Here we present the first installment of the HOR-track covering only the 17 HORs that belong to SF1 which forms live functional centromeres in chromosomes 1, 3, 5, 6, 7, 10, 12, 16 and 19 and also a large number of minor dead HOR domains, both homogeneous and divergent. Monomer-by-monomer HOR annotation used for this dataset as opposed to annotation of whole HOR repeats provides for mapping and quantification of various structural variants of AS HORs which can be used to collect data on inter-individual polymorphism of AS. Keywords: Centromeres, Alpha satellite, Suprachromosomal families, Higher-order repeats, hg38 human genome assembly
url http://www.sciencedirect.com/science/article/pii/S2352340919300575
work_keys_str_mv AT liuralsky classificationandmonomerbymonomerannotationdatasetofsuprachromosomalfamily1alphasatellitehigherorderrepeatsinhg38humangenomeassembly
AT vashepelev classificationandmonomerbymonomerannotationdatasetofsuprachromosomalfamily1alphasatellitehigherorderrepeatsinhg38humangenomeassembly
AT aaalexandrov classificationandmonomerbymonomerannotationdatasetofsuprachromosomalfamily1alphasatellitehigherorderrepeatsinhg38humangenomeassembly
AT ybyurov classificationandmonomerbymonomerannotationdatasetofsuprachromosomalfamily1alphasatellitehigherorderrepeatsinhg38humangenomeassembly
AT eirogaev classificationandmonomerbymonomerannotationdatasetofsuprachromosomalfamily1alphasatellitehigherorderrepeatsinhg38humangenomeassembly
AT iaalexandrov classificationandmonomerbymonomerannotationdatasetofsuprachromosomalfamily1alphasatellitehigherorderrepeatsinhg38humangenomeassembly
_version_ 1724814873498484736