A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations.

Human Endogenous Retrovirus type K (HERV-K) is the only HERV known to be insertionally polymorphic; not all individuals have a retrovirus at a specific genomic location. It is possible that HERV-Ks contribute to human disease because people differ in both number and genomic location of these retrovi...

Full description

Bibliographic Details
Main Authors: Weiling Li, Lin Lin, Raunaq Malhotra, Lei Yang, Raj Acharya, Mary Poss
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2019-03-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC6456218?pdf=render
id doaj-1ba53ce11f8a48b891dec28a732679a9
record_format Article
spelling doaj-1ba53ce11f8a48b891dec28a732679a92020-11-25T01:18:25ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582019-03-01153e100656410.1371/journal.pcbi.1006564A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations.Weiling LiLin LinRaunaq MalhotraLei YangRaj AcharyaMary PossHuman Endogenous Retrovirus type K (HERV-K) is the only HERV known to be insertionally polymorphic; not all individuals have a retrovirus at a specific genomic location. It is possible that HERV-Ks contribute to human disease because people differ in both number and genomic location of these retroviruses. Indeed viral transcripts, proteins, and antibody against HERV-K are detected in cancers, auto-immune, and neurodegenerative diseases. However, attempts to link a polymorphic HERV-K with any disease have been frustrated in part because population prevalence of HERV-K provirus at each polymorphic site is lacking and it is challenging to identify closely related elements such as HERV-K from short read sequence data. We present an integrated and computationally robust approach that uses whole genome short read data to determine the occupation status at all sites reported to contain a HERV-K provirus. Our method estimates the proportion of fixed length genomic sequence (k-mers) from whole genome sequence data matching a reference set of k-mers unique to each HERV-K locus and applies mixture model-based clustering of these values to account for low depth sequence data. Our analysis of 1000 Genomes Project Data (KGP) reveals numerous differences among the five KGP super-populations in the prevalence of individual and co-occurring HERV-K proviruses; we provide a visualization tool to easily depict the proportion of the KGP populations with any combination of polymorphic HERV-K provirus. Further, because HERV-K is insertionally polymorphic, the genome burden of known polymorphic HERV-K is variable in humans; this burden is lowest in East Asian (EAS) individuals. Our study identifies population-specific sequence variation for HERV-K proviruses at several loci. We expect these resources will advance research on HERV-K contributions to human diseases.http://europepmc.org/articles/PMC6456218?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Weiling Li
Lin Lin
Raunaq Malhotra
Lei Yang
Raj Acharya
Mary Poss
spellingShingle Weiling Li
Lin Lin
Raunaq Malhotra
Lei Yang
Raj Acharya
Mary Poss
A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations.
PLoS Computational Biology
author_facet Weiling Li
Lin Lin
Raunaq Malhotra
Lei Yang
Raj Acharya
Mary Poss
author_sort Weiling Li
title A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations.
title_short A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations.
title_full A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations.
title_fullStr A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations.
title_full_unstemmed A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations.
title_sort computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-k in human populations.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2019-03-01
description Human Endogenous Retrovirus type K (HERV-K) is the only HERV known to be insertionally polymorphic; not all individuals have a retrovirus at a specific genomic location. It is possible that HERV-Ks contribute to human disease because people differ in both number and genomic location of these retroviruses. Indeed viral transcripts, proteins, and antibody against HERV-K are detected in cancers, auto-immune, and neurodegenerative diseases. However, attempts to link a polymorphic HERV-K with any disease have been frustrated in part because population prevalence of HERV-K provirus at each polymorphic site is lacking and it is challenging to identify closely related elements such as HERV-K from short read sequence data. We present an integrated and computationally robust approach that uses whole genome short read data to determine the occupation status at all sites reported to contain a HERV-K provirus. Our method estimates the proportion of fixed length genomic sequence (k-mers) from whole genome sequence data matching a reference set of k-mers unique to each HERV-K locus and applies mixture model-based clustering of these values to account for low depth sequence data. Our analysis of 1000 Genomes Project Data (KGP) reveals numerous differences among the five KGP super-populations in the prevalence of individual and co-occurring HERV-K proviruses; we provide a visualization tool to easily depict the proportion of the KGP populations with any combination of polymorphic HERV-K provirus. Further, because HERV-K is insertionally polymorphic, the genome burden of known polymorphic HERV-K is variable in humans; this burden is lowest in East Asian (EAS) individuals. Our study identifies population-specific sequence variation for HERV-K proviruses at several loci. We expect these resources will advance research on HERV-K contributions to human diseases.
url http://europepmc.org/articles/PMC6456218?pdf=render
work_keys_str_mv AT weilingli acomputationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT linlin acomputationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT raunaqmalhotra acomputationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT leiyang acomputationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT rajacharya acomputationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT maryposs acomputationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT weilingli computationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT linlin computationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT raunaqmalhotra computationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT leiyang computationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT rajacharya computationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT maryposs computationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
_version_ 1725142637474742272