Subpopulation Discovery in Epidemiological Data with Subspace Clustering

A prerequisite of personalized medicine is the identification of groups of people who share specific risk factors towards an outcome. We investigate the potential of subspace clustering for finding such groups in epidemiological data. We propose a workflow that encompasses clusterability assessment...

Full description

Bibliographic Details
Main Authors: Niemann Uli, Spiliopoulou Myra, Völzke Henry, Kühn Jens-Peter
Format: Article
Language:English
Published: Sciendo 2014-12-01
Series:Foundations of Computing and Decision Sciences
Online Access:https://doi.org/10.2478/fcds-2014-0015
id doaj-ea10838ccac5486e9ca26e5c00be3a78
record_format Article
spelling doaj-ea10838ccac5486e9ca26e5c00be3a782021-09-05T21:00:54ZengSciendoFoundations of Computing and Decision Sciences2300-34052014-12-0139427130010.2478/fcds-2014-0015fcds-2014-0015Subpopulation Discovery in Epidemiological Data with Subspace ClusteringNiemann Uli0Spiliopoulou Myra1Völzke Henry2Kühn Jens-Peter3Otto-von-Guericke University Magdeburg, GermanyOtto-von-Guericke University Magdeburg, GermanyUniversity Medicine Greifswald, GermanyUniversity Medicine Greifswald, GermanyA prerequisite of personalized medicine is the identification of groups of people who share specific risk factors towards an outcome. We investigate the potential of subspace clustering for finding such groups in epidemiological data. We propose a workflow that encompasses clusterability assessment before cluster discovery and quality assessment after learning the clusters. Epidemiological usually do not have a ground truth for the verification of clusters found in subspaces. Hence, we introduce quality assessment through juxtaposition of the learned models to “models-of-randomness”, i.e. models that do not reflect a true cluster structure. On the basis of this workflow, we select subspace clustering methods, compare and discuss their performance. We use a dataset with hepatic steatosis as outcome, but our findings apply on arbitrary epidemiological cohort data that have tenths of variables and exhibit class skew.https://doi.org/10.2478/fcds-2014-0015
collection DOAJ
language English
format Article
sources DOAJ
author Niemann Uli
Spiliopoulou Myra
Völzke Henry
Kühn Jens-Peter
spellingShingle Niemann Uli
Spiliopoulou Myra
Völzke Henry
Kühn Jens-Peter
Subpopulation Discovery in Epidemiological Data with Subspace Clustering
Foundations of Computing and Decision Sciences
author_facet Niemann Uli
Spiliopoulou Myra
Völzke Henry
Kühn Jens-Peter
author_sort Niemann Uli
title Subpopulation Discovery in Epidemiological Data with Subspace Clustering
title_short Subpopulation Discovery in Epidemiological Data with Subspace Clustering
title_full Subpopulation Discovery in Epidemiological Data with Subspace Clustering
title_fullStr Subpopulation Discovery in Epidemiological Data with Subspace Clustering
title_full_unstemmed Subpopulation Discovery in Epidemiological Data with Subspace Clustering
title_sort subpopulation discovery in epidemiological data with subspace clustering
publisher Sciendo
series Foundations of Computing and Decision Sciences
issn 2300-3405
publishDate 2014-12-01
description A prerequisite of personalized medicine is the identification of groups of people who share specific risk factors towards an outcome. We investigate the potential of subspace clustering for finding such groups in epidemiological data. We propose a workflow that encompasses clusterability assessment before cluster discovery and quality assessment after learning the clusters. Epidemiological usually do not have a ground truth for the verification of clusters found in subspaces. Hence, we introduce quality assessment through juxtaposition of the learned models to “models-of-randomness”, i.e. models that do not reflect a true cluster structure. On the basis of this workflow, we select subspace clustering methods, compare and discuss their performance. We use a dataset with hepatic steatosis as outcome, but our findings apply on arbitrary epidemiological cohort data that have tenths of variables and exhibit class skew.
url https://doi.org/10.2478/fcds-2014-0015
work_keys_str_mv AT niemannuli subpopulationdiscoveryinepidemiologicaldatawithsubspaceclustering
AT spiliopouloumyra subpopulationdiscoveryinepidemiologicaldatawithsubspaceclustering
AT volzkehenry subpopulationdiscoveryinepidemiologicaldatawithsubspaceclustering
AT kuhnjenspeter subpopulationdiscoveryinepidemiologicaldatawithsubspaceclustering
_version_ 1717782105753124864