Critical evaluation of linear regression models for cell-subtype specific methylation signal from mixed blood cell DNA.

Epigenome-wide association studies seek to identify DNA methylation sites associated with clinical outcomes. Difference in observed methylation between specific cell-subtypes is often of interest; however, available samples often comprise a mixture of cells. To date, cell-subtype estimates have been...

Full description

Bibliographic Details
Main Authors: Daniel W Kennedy, Nicole M White, Miles C Benton, Andrew Fox, Rodney J Scott, Lyn R Griffiths, Kerrie Mengersen, Rodney A Lea
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2018-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0208915
id doaj-093a94c2e9ed4fe4a844484295beecff
record_format Article
spelling doaj-093a94c2e9ed4fe4a844484295beecff2021-03-03T21:01:05ZengPublic Library of Science (PLoS)PLoS ONE1932-62032018-01-011312e020891510.1371/journal.pone.0208915Critical evaluation of linear regression models for cell-subtype specific methylation signal from mixed blood cell DNA.Daniel W KennedyNicole M WhiteMiles C BentonAndrew FoxRodney J ScottLyn R GriffithsKerrie MengersenRodney A LeaRodney A LeaEpigenome-wide association studies seek to identify DNA methylation sites associated with clinical outcomes. Difference in observed methylation between specific cell-subtypes is often of interest; however, available samples often comprise a mixture of cells. To date, cell-subtype estimates have been obtained from mixed-cell DNA data using linear regression models, but the accuracy of such estimates has not been critically assessed. We evaluated linear regression performance for cell-subtype specific methylation estimation using a 450K methylation array dataset of both mixed-cell and cell-subtype sorted samples from six healthy males. CpGs associated with each cell-subtype were first identified using t-tests between groups of cell-subtype sorted samples. Subsequent reduced panels of reliably accurate CpGs were identified from mixed-cell samples using an accuracy heuristic (D). Performance was assessed by comparing cell-subtype specific estimates from mixed-cells with corresponding cell-sorted mean using the mean absolute error (MAE) and the Coefficient of Determination (R2). At the cell-subtype level, methylation levels at 3272 CpGs could be estimated to within a MAE of 5% of the expected value. The cell-subtypes with the highest accuracy were CD56+ NK (R2 = 0.56) and CD8+T (R2 = 0.48), where 23% of sites were accurately estimated. Hierarchical clustering and pathways enrichment analysis confirmed the biological relevance of the panels. Our results suggest that linear regression for cell-subtype specific methylation estimation is accurate only for some cell-subtypes at a small fraction of cell-associated sites but may be applicable to EWASs of disease traits with a blood-based pathology. Although sample size was a limitation in this study, we suggest that alternative statistical methods will provide the greatest performance improvements.https://doi.org/10.1371/journal.pone.0208915
collection DOAJ
language English
format Article
sources DOAJ
author Daniel W Kennedy
Nicole M White
Miles C Benton
Andrew Fox
Rodney J Scott
Lyn R Griffiths
Kerrie Mengersen
Rodney A Lea
Rodney A Lea
spellingShingle Daniel W Kennedy
Nicole M White
Miles C Benton
Andrew Fox
Rodney J Scott
Lyn R Griffiths
Kerrie Mengersen
Rodney A Lea
Rodney A Lea
Critical evaluation of linear regression models for cell-subtype specific methylation signal from mixed blood cell DNA.
PLoS ONE
author_facet Daniel W Kennedy
Nicole M White
Miles C Benton
Andrew Fox
Rodney J Scott
Lyn R Griffiths
Kerrie Mengersen
Rodney A Lea
Rodney A Lea
author_sort Daniel W Kennedy
title Critical evaluation of linear regression models for cell-subtype specific methylation signal from mixed blood cell DNA.
title_short Critical evaluation of linear regression models for cell-subtype specific methylation signal from mixed blood cell DNA.
title_full Critical evaluation of linear regression models for cell-subtype specific methylation signal from mixed blood cell DNA.
title_fullStr Critical evaluation of linear regression models for cell-subtype specific methylation signal from mixed blood cell DNA.
title_full_unstemmed Critical evaluation of linear regression models for cell-subtype specific methylation signal from mixed blood cell DNA.
title_sort critical evaluation of linear regression models for cell-subtype specific methylation signal from mixed blood cell dna.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2018-01-01
description Epigenome-wide association studies seek to identify DNA methylation sites associated with clinical outcomes. Difference in observed methylation between specific cell-subtypes is often of interest; however, available samples often comprise a mixture of cells. To date, cell-subtype estimates have been obtained from mixed-cell DNA data using linear regression models, but the accuracy of such estimates has not been critically assessed. We evaluated linear regression performance for cell-subtype specific methylation estimation using a 450K methylation array dataset of both mixed-cell and cell-subtype sorted samples from six healthy males. CpGs associated with each cell-subtype were first identified using t-tests between groups of cell-subtype sorted samples. Subsequent reduced panels of reliably accurate CpGs were identified from mixed-cell samples using an accuracy heuristic (D). Performance was assessed by comparing cell-subtype specific estimates from mixed-cells with corresponding cell-sorted mean using the mean absolute error (MAE) and the Coefficient of Determination (R2). At the cell-subtype level, methylation levels at 3272 CpGs could be estimated to within a MAE of 5% of the expected value. The cell-subtypes with the highest accuracy were CD56+ NK (R2 = 0.56) and CD8+T (R2 = 0.48), where 23% of sites were accurately estimated. Hierarchical clustering and pathways enrichment analysis confirmed the biological relevance of the panels. Our results suggest that linear regression for cell-subtype specific methylation estimation is accurate only for some cell-subtypes at a small fraction of cell-associated sites but may be applicable to EWASs of disease traits with a blood-based pathology. Although sample size was a limitation in this study, we suggest that alternative statistical methods will provide the greatest performance improvements.
url https://doi.org/10.1371/journal.pone.0208915
work_keys_str_mv AT danielwkennedy criticalevaluationoflinearregressionmodelsforcellsubtypespecificmethylationsignalfrommixedbloodcelldna
AT nicolemwhite criticalevaluationoflinearregressionmodelsforcellsubtypespecificmethylationsignalfrommixedbloodcelldna
AT milescbenton criticalevaluationoflinearregressionmodelsforcellsubtypespecificmethylationsignalfrommixedbloodcelldna
AT andrewfox criticalevaluationoflinearregressionmodelsforcellsubtypespecificmethylationsignalfrommixedbloodcelldna
AT rodneyjscott criticalevaluationoflinearregressionmodelsforcellsubtypespecificmethylationsignalfrommixedbloodcelldna
AT lynrgriffiths criticalevaluationoflinearregressionmodelsforcellsubtypespecificmethylationsignalfrommixedbloodcelldna
AT kerriemengersen criticalevaluationoflinearregressionmodelsforcellsubtypespecificmethylationsignalfrommixedbloodcelldna
AT rodneyalea criticalevaluationoflinearregressionmodelsforcellsubtypespecificmethylationsignalfrommixedbloodcelldna
AT rodneyalea criticalevaluationoflinearregressionmodelsforcellsubtypespecificmethylationsignalfrommixedbloodcelldna
_version_ 1714819231849119744