Subpopulation Discovery in Epidemiological Data with Subspace Clustering
A prerequisite of personalized medicine is the identification of groups of people who share specific risk factors towards an outcome. We investigate the potential of subspace clustering for finding such groups in epidemiological data. We propose a workflow that encompasses clusterability assessment...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Sciendo
2014-12-01
|
Series: | Foundations of Computing and Decision Sciences |
Online Access: | https://doi.org/10.2478/fcds-2014-0015 |
id |
doaj-ea10838ccac5486e9ca26e5c00be3a78 |
---|---|
record_format |
Article |
spelling |
doaj-ea10838ccac5486e9ca26e5c00be3a782021-09-05T21:00:54ZengSciendoFoundations of Computing and Decision Sciences2300-34052014-12-0139427130010.2478/fcds-2014-0015fcds-2014-0015Subpopulation Discovery in Epidemiological Data with Subspace ClusteringNiemann Uli0Spiliopoulou Myra1Völzke Henry2Kühn Jens-Peter3Otto-von-Guericke University Magdeburg, GermanyOtto-von-Guericke University Magdeburg, GermanyUniversity Medicine Greifswald, GermanyUniversity Medicine Greifswald, GermanyA prerequisite of personalized medicine is the identification of groups of people who share specific risk factors towards an outcome. We investigate the potential of subspace clustering for finding such groups in epidemiological data. We propose a workflow that encompasses clusterability assessment before cluster discovery and quality assessment after learning the clusters. Epidemiological usually do not have a ground truth for the verification of clusters found in subspaces. Hence, we introduce quality assessment through juxtaposition of the learned models to “models-of-randomness”, i.e. models that do not reflect a true cluster structure. On the basis of this workflow, we select subspace clustering methods, compare and discuss their performance. We use a dataset with hepatic steatosis as outcome, but our findings apply on arbitrary epidemiological cohort data that have tenths of variables and exhibit class skew.https://doi.org/10.2478/fcds-2014-0015 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Niemann Uli Spiliopoulou Myra Völzke Henry Kühn Jens-Peter |
spellingShingle |
Niemann Uli Spiliopoulou Myra Völzke Henry Kühn Jens-Peter Subpopulation Discovery in Epidemiological Data with Subspace Clustering Foundations of Computing and Decision Sciences |
author_facet |
Niemann Uli Spiliopoulou Myra Völzke Henry Kühn Jens-Peter |
author_sort |
Niemann Uli |
title |
Subpopulation Discovery in Epidemiological Data with Subspace Clustering |
title_short |
Subpopulation Discovery in Epidemiological Data with Subspace Clustering |
title_full |
Subpopulation Discovery in Epidemiological Data with Subspace Clustering |
title_fullStr |
Subpopulation Discovery in Epidemiological Data with Subspace Clustering |
title_full_unstemmed |
Subpopulation Discovery in Epidemiological Data with Subspace Clustering |
title_sort |
subpopulation discovery in epidemiological data with subspace clustering |
publisher |
Sciendo |
series |
Foundations of Computing and Decision Sciences |
issn |
2300-3405 |
publishDate |
2014-12-01 |
description |
A prerequisite of personalized medicine is the identification of groups of people who share specific risk factors towards an outcome. We investigate the potential of subspace clustering for finding such groups in epidemiological data. We propose a workflow that encompasses clusterability assessment before cluster discovery and quality assessment after learning the clusters. Epidemiological usually do not have a ground truth for the verification of clusters found in subspaces. Hence, we introduce quality assessment through juxtaposition of the learned models to “models-of-randomness”, i.e. models that do not reflect a true cluster structure. On the basis of this workflow, we select subspace clustering methods, compare and discuss their performance. We use a dataset with hepatic steatosis as outcome, but our findings apply on arbitrary epidemiological cohort data that have tenths of variables and exhibit class skew. |
url |
https://doi.org/10.2478/fcds-2014-0015 |
work_keys_str_mv |
AT niemannuli subpopulationdiscoveryinepidemiologicaldatawithsubspaceclustering AT spiliopouloumyra subpopulationdiscoveryinepidemiologicaldatawithsubspaceclustering AT volzkehenry subpopulationdiscoveryinepidemiologicaldatawithsubspaceclustering AT kuhnjenspeter subpopulationdiscoveryinepidemiologicaldatawithsubspaceclustering |
_version_ |
1717782105753124864 |