Landscape and variation of novel retroduplications in 26 human populations.

Retroduplications come from reverse transcription of mRNAs and their insertion back into the genome. Here, we performed comprehensive discovery and analysis of retroduplications in a large cohort of 2,535 individuals from 26 human populations, as part of 1000 Genomes Phase 3. We developed an integra...

Full description

Bibliographic Details
Main Authors: Yan Zhang, Shantao Li, Alexej Abyzov, Mark B Gerstein
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2017-06-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1005567
id doaj-e432ae4b3ff0423a890d46fa8fc0fae3
record_format Article
spelling doaj-e432ae4b3ff0423a890d46fa8fc0fae32021-04-21T15:39:08ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582017-06-01136e100556710.1371/journal.pcbi.1005567Landscape and variation of novel retroduplications in 26 human populations.Yan ZhangShantao LiAlexej AbyzovMark B GersteinRetroduplications come from reverse transcription of mRNAs and their insertion back into the genome. Here, we performed comprehensive discovery and analysis of retroduplications in a large cohort of 2,535 individuals from 26 human populations, as part of 1000 Genomes Phase 3. We developed an integrated approach to discover novel retroduplications combining high-coverage exome and low-coverage whole-genome sequencing data, utilizing information from both exon-exon junctions and discordant paired-end reads. We found 503 parent genes having novel retroduplications absent from the reference genome. Based solely on retroduplication variation, we built phylogenetic trees of human populations; these represent superpopulation structure well and indicate that variable retroduplications are effective population markers. We further identified 43 retroduplication parent genes differentiating superpopulations. This group contains several interesting insertion events, including a SLMO2 retroduplication and insertion into CAV3, which has a potential disease association. We also found retroduplications to be associated with a variety of genomic features: (1) Insertion sites were correlated with regular nucleosome positioning. (2) They, predictably, tend to avoid conserved functional regions, such as exons, but, somewhat surprisingly, also avoid introns. (3) Retroduplications tend to be co-inserted with young L1 elements, indicating recent retrotranspositional activity, and (4) they have a weak tendency to originate from highly expressed parent genes. Our investigation provides insight into the functional impact and association with genomic elements of retroduplications. We anticipate our approach and analytical methodology to have application in a more clinical context, where exome sequencing data is abundant and the discovery of retroduplications can potentially improve the accuracy of SNP calling.https://doi.org/10.1371/journal.pcbi.1005567
collection DOAJ
language English
format Article
sources DOAJ
author Yan Zhang
Shantao Li
Alexej Abyzov
Mark B Gerstein
spellingShingle Yan Zhang
Shantao Li
Alexej Abyzov
Mark B Gerstein
Landscape and variation of novel retroduplications in 26 human populations.
PLoS Computational Biology
author_facet Yan Zhang
Shantao Li
Alexej Abyzov
Mark B Gerstein
author_sort Yan Zhang
title Landscape and variation of novel retroduplications in 26 human populations.
title_short Landscape and variation of novel retroduplications in 26 human populations.
title_full Landscape and variation of novel retroduplications in 26 human populations.
title_fullStr Landscape and variation of novel retroduplications in 26 human populations.
title_full_unstemmed Landscape and variation of novel retroduplications in 26 human populations.
title_sort landscape and variation of novel retroduplications in 26 human populations.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2017-06-01
description Retroduplications come from reverse transcription of mRNAs and their insertion back into the genome. Here, we performed comprehensive discovery and analysis of retroduplications in a large cohort of 2,535 individuals from 26 human populations, as part of 1000 Genomes Phase 3. We developed an integrated approach to discover novel retroduplications combining high-coverage exome and low-coverage whole-genome sequencing data, utilizing information from both exon-exon junctions and discordant paired-end reads. We found 503 parent genes having novel retroduplications absent from the reference genome. Based solely on retroduplication variation, we built phylogenetic trees of human populations; these represent superpopulation structure well and indicate that variable retroduplications are effective population markers. We further identified 43 retroduplication parent genes differentiating superpopulations. This group contains several interesting insertion events, including a SLMO2 retroduplication and insertion into CAV3, which has a potential disease association. We also found retroduplications to be associated with a variety of genomic features: (1) Insertion sites were correlated with regular nucleosome positioning. (2) They, predictably, tend to avoid conserved functional regions, such as exons, but, somewhat surprisingly, also avoid introns. (3) Retroduplications tend to be co-inserted with young L1 elements, indicating recent retrotranspositional activity, and (4) they have a weak tendency to originate from highly expressed parent genes. Our investigation provides insight into the functional impact and association with genomic elements of retroduplications. We anticipate our approach and analytical methodology to have application in a more clinical context, where exome sequencing data is abundant and the discovery of retroduplications can potentially improve the accuracy of SNP calling.
url https://doi.org/10.1371/journal.pcbi.1005567
work_keys_str_mv AT yanzhang landscapeandvariationofnovelretroduplicationsin26humanpopulations
AT shantaoli landscapeandvariationofnovelretroduplicationsin26humanpopulations
AT alexejabyzov landscapeandvariationofnovelretroduplicationsin26humanpopulations
AT markbgerstein landscapeandvariationofnovelretroduplicationsin26humanpopulations
_version_ 1714667207871430656