Critical evaluation of linear regression models for cell-subtype specific methylation signal from mixed blood cell DNA.
Epigenome-wide association studies seek to identify DNA methylation sites associated with clinical outcomes. Difference in observed methylation between specific cell-subtypes is often of interest; however, available samples often comprise a mixture of cells. To date, cell-subtype estimates have been...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2018-01-01
|
Series: | PLoS ONE |
Online Access: | https://doi.org/10.1371/journal.pone.0208915 |
id |
doaj-093a94c2e9ed4fe4a844484295beecff |
---|---|
record_format |
Article |
spelling |
doaj-093a94c2e9ed4fe4a844484295beecff2021-03-03T21:01:05ZengPublic Library of Science (PLoS)PLoS ONE1932-62032018-01-011312e020891510.1371/journal.pone.0208915Critical evaluation of linear regression models for cell-subtype specific methylation signal from mixed blood cell DNA.Daniel W KennedyNicole M WhiteMiles C BentonAndrew FoxRodney J ScottLyn R GriffithsKerrie MengersenRodney A LeaRodney A LeaEpigenome-wide association studies seek to identify DNA methylation sites associated with clinical outcomes. Difference in observed methylation between specific cell-subtypes is often of interest; however, available samples often comprise a mixture of cells. To date, cell-subtype estimates have been obtained from mixed-cell DNA data using linear regression models, but the accuracy of such estimates has not been critically assessed. We evaluated linear regression performance for cell-subtype specific methylation estimation using a 450K methylation array dataset of both mixed-cell and cell-subtype sorted samples from six healthy males. CpGs associated with each cell-subtype were first identified using t-tests between groups of cell-subtype sorted samples. Subsequent reduced panels of reliably accurate CpGs were identified from mixed-cell samples using an accuracy heuristic (D). Performance was assessed by comparing cell-subtype specific estimates from mixed-cells with corresponding cell-sorted mean using the mean absolute error (MAE) and the Coefficient of Determination (R2). At the cell-subtype level, methylation levels at 3272 CpGs could be estimated to within a MAE of 5% of the expected value. The cell-subtypes with the highest accuracy were CD56+ NK (R2 = 0.56) and CD8+T (R2 = 0.48), where 23% of sites were accurately estimated. Hierarchical clustering and pathways enrichment analysis confirmed the biological relevance of the panels. Our results suggest that linear regression for cell-subtype specific methylation estimation is accurate only for some cell-subtypes at a small fraction of cell-associated sites but may be applicable to EWASs of disease traits with a blood-based pathology. Although sample size was a limitation in this study, we suggest that alternative statistical methods will provide the greatest performance improvements.https://doi.org/10.1371/journal.pone.0208915 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Daniel W Kennedy Nicole M White Miles C Benton Andrew Fox Rodney J Scott Lyn R Griffiths Kerrie Mengersen Rodney A Lea Rodney A Lea |
spellingShingle |
Daniel W Kennedy Nicole M White Miles C Benton Andrew Fox Rodney J Scott Lyn R Griffiths Kerrie Mengersen Rodney A Lea Rodney A Lea Critical evaluation of linear regression models for cell-subtype specific methylation signal from mixed blood cell DNA. PLoS ONE |
author_facet |
Daniel W Kennedy Nicole M White Miles C Benton Andrew Fox Rodney J Scott Lyn R Griffiths Kerrie Mengersen Rodney A Lea Rodney A Lea |
author_sort |
Daniel W Kennedy |
title |
Critical evaluation of linear regression models for cell-subtype specific methylation signal from mixed blood cell DNA. |
title_short |
Critical evaluation of linear regression models for cell-subtype specific methylation signal from mixed blood cell DNA. |
title_full |
Critical evaluation of linear regression models for cell-subtype specific methylation signal from mixed blood cell DNA. |
title_fullStr |
Critical evaluation of linear regression models for cell-subtype specific methylation signal from mixed blood cell DNA. |
title_full_unstemmed |
Critical evaluation of linear regression models for cell-subtype specific methylation signal from mixed blood cell DNA. |
title_sort |
critical evaluation of linear regression models for cell-subtype specific methylation signal from mixed blood cell dna. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2018-01-01 |
description |
Epigenome-wide association studies seek to identify DNA methylation sites associated with clinical outcomes. Difference in observed methylation between specific cell-subtypes is often of interest; however, available samples often comprise a mixture of cells. To date, cell-subtype estimates have been obtained from mixed-cell DNA data using linear regression models, but the accuracy of such estimates has not been critically assessed. We evaluated linear regression performance for cell-subtype specific methylation estimation using a 450K methylation array dataset of both mixed-cell and cell-subtype sorted samples from six healthy males. CpGs associated with each cell-subtype were first identified using t-tests between groups of cell-subtype sorted samples. Subsequent reduced panels of reliably accurate CpGs were identified from mixed-cell samples using an accuracy heuristic (D). Performance was assessed by comparing cell-subtype specific estimates from mixed-cells with corresponding cell-sorted mean using the mean absolute error (MAE) and the Coefficient of Determination (R2). At the cell-subtype level, methylation levels at 3272 CpGs could be estimated to within a MAE of 5% of the expected value. The cell-subtypes with the highest accuracy were CD56+ NK (R2 = 0.56) and CD8+T (R2 = 0.48), where 23% of sites were accurately estimated. Hierarchical clustering and pathways enrichment analysis confirmed the biological relevance of the panels. Our results suggest that linear regression for cell-subtype specific methylation estimation is accurate only for some cell-subtypes at a small fraction of cell-associated sites but may be applicable to EWASs of disease traits with a blood-based pathology. Although sample size was a limitation in this study, we suggest that alternative statistical methods will provide the greatest performance improvements. |
url |
https://doi.org/10.1371/journal.pone.0208915 |
work_keys_str_mv |
AT danielwkennedy criticalevaluationoflinearregressionmodelsforcellsubtypespecificmethylationsignalfrommixedbloodcelldna AT nicolemwhite criticalevaluationoflinearregressionmodelsforcellsubtypespecificmethylationsignalfrommixedbloodcelldna AT milescbenton criticalevaluationoflinearregressionmodelsforcellsubtypespecificmethylationsignalfrommixedbloodcelldna AT andrewfox criticalevaluationoflinearregressionmodelsforcellsubtypespecificmethylationsignalfrommixedbloodcelldna AT rodneyjscott criticalevaluationoflinearregressionmodelsforcellsubtypespecificmethylationsignalfrommixedbloodcelldna AT lynrgriffiths criticalevaluationoflinearregressionmodelsforcellsubtypespecificmethylationsignalfrommixedbloodcelldna AT kerriemengersen criticalevaluationoflinearregressionmodelsforcellsubtypespecificmethylationsignalfrommixedbloodcelldna AT rodneyalea criticalevaluationoflinearregressionmodelsforcellsubtypespecificmethylationsignalfrommixedbloodcelldna AT rodneyalea criticalevaluationoflinearregressionmodelsforcellsubtypespecificmethylationsignalfrommixedbloodcelldna |
_version_ |
1714819231849119744 |