Query-constraint-based mining of association rules for exploratory analysis of clinical datasets in the National Sleep Research Resource

Abstract Background Association Rule Mining (ARM) has been widely used by biomedical researchers to perform exploratory data analysis and uncover potential relationships among variables in biomedical datasets. However, when biomedical datasets are high-dimensional, performing ARM on such datasets wi...

Full description

Bibliographic Details
Main Authors: Rashmie Abeysinghe, Licong Cui
Format: Article
Language:English
Published: BMC 2018-07-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12911-018-0633-7
id doaj-4a14c141ae3f4a15ada9e9afd7681e07
record_format Article
spelling doaj-4a14c141ae3f4a15ada9e9afd7681e072020-11-24T21:55:49ZengBMCBMC Medical Informatics and Decision Making1472-69472018-07-0118S28910010.1186/s12911-018-0633-7Query-constraint-based mining of association rules for exploratory analysis of clinical datasets in the National Sleep Research ResourceRashmie Abeysinghe0Licong Cui1Department of Computer ScienceDepartment of Computer ScienceAbstract Background Association Rule Mining (ARM) has been widely used by biomedical researchers to perform exploratory data analysis and uncover potential relationships among variables in biomedical datasets. However, when biomedical datasets are high-dimensional, performing ARM on such datasets will yield a large number of rules, many of which may be uninteresting. Especially for imbalanced datasets, performing ARM directly would result in uninteresting rules that are dominated by certain variables that capture general characteristics. Methods We introduce a query-constraint-based ARM (QARM) approach for exploratory analysis of multiple, diverse clinical datasets in the National Sleep Research Resource (NSRR). QARM enables rule mining on a subset of data items satisfying a query constraint. We first perform a series of data-preprocessing steps including variable selection, merging semantically similar variables, combining multiple-visit data, and data transformation. We use Top-k Non-Redundant (TNR) ARM algorithm to generate association rules. Then we remove general and subsumed rules so that unique and non-redundant rules are resulted for a particular query constraint. Results Applying QARM on five datasets from NSRR obtained a total of 2517 association rules with a minimum confidence of 60% (using top 100 rules for each query constraint). The results show that merging similar variables could avoid uninteresting rules. Also, removing general and subsumed rules resulted in a more concise and interesting set of rules. Conclusions QARM shows the potential to support exploratory analysis of large biomedical datasets. It is also shown as a useful method to reduce the number of uninteresting association rules generated from imbalanced datasets. A preliminary literature-based analysis showed that some association rules have supporting evidence from biomedical literature, while others without literature-based evidence may serve as the candidates for new hypotheses to explore and investigate. Together with literature-based evidence, the association rules mined over the NSRR clinical datasets may be used to support clinical decisions for sleep-related problems.http://link.springer.com/article/10.1186/s12911-018-0633-7Query-constraint-based association rule miningNational sleep research resourceExploratory data analysis
collection DOAJ
language English
format Article
sources DOAJ
author Rashmie Abeysinghe
Licong Cui
spellingShingle Rashmie Abeysinghe
Licong Cui
Query-constraint-based mining of association rules for exploratory analysis of clinical datasets in the National Sleep Research Resource
BMC Medical Informatics and Decision Making
Query-constraint-based association rule mining
National sleep research resource
Exploratory data analysis
author_facet Rashmie Abeysinghe
Licong Cui
author_sort Rashmie Abeysinghe
title Query-constraint-based mining of association rules for exploratory analysis of clinical datasets in the National Sleep Research Resource
title_short Query-constraint-based mining of association rules for exploratory analysis of clinical datasets in the National Sleep Research Resource
title_full Query-constraint-based mining of association rules for exploratory analysis of clinical datasets in the National Sleep Research Resource
title_fullStr Query-constraint-based mining of association rules for exploratory analysis of clinical datasets in the National Sleep Research Resource
title_full_unstemmed Query-constraint-based mining of association rules for exploratory analysis of clinical datasets in the National Sleep Research Resource
title_sort query-constraint-based mining of association rules for exploratory analysis of clinical datasets in the national sleep research resource
publisher BMC
series BMC Medical Informatics and Decision Making
issn 1472-6947
publishDate 2018-07-01
description Abstract Background Association Rule Mining (ARM) has been widely used by biomedical researchers to perform exploratory data analysis and uncover potential relationships among variables in biomedical datasets. However, when biomedical datasets are high-dimensional, performing ARM on such datasets will yield a large number of rules, many of which may be uninteresting. Especially for imbalanced datasets, performing ARM directly would result in uninteresting rules that are dominated by certain variables that capture general characteristics. Methods We introduce a query-constraint-based ARM (QARM) approach for exploratory analysis of multiple, diverse clinical datasets in the National Sleep Research Resource (NSRR). QARM enables rule mining on a subset of data items satisfying a query constraint. We first perform a series of data-preprocessing steps including variable selection, merging semantically similar variables, combining multiple-visit data, and data transformation. We use Top-k Non-Redundant (TNR) ARM algorithm to generate association rules. Then we remove general and subsumed rules so that unique and non-redundant rules are resulted for a particular query constraint. Results Applying QARM on five datasets from NSRR obtained a total of 2517 association rules with a minimum confidence of 60% (using top 100 rules for each query constraint). The results show that merging similar variables could avoid uninteresting rules. Also, removing general and subsumed rules resulted in a more concise and interesting set of rules. Conclusions QARM shows the potential to support exploratory analysis of large biomedical datasets. It is also shown as a useful method to reduce the number of uninteresting association rules generated from imbalanced datasets. A preliminary literature-based analysis showed that some association rules have supporting evidence from biomedical literature, while others without literature-based evidence may serve as the candidates for new hypotheses to explore and investigate. Together with literature-based evidence, the association rules mined over the NSRR clinical datasets may be used to support clinical decisions for sleep-related problems.
topic Query-constraint-based association rule mining
National sleep research resource
Exploratory data analysis
url http://link.springer.com/article/10.1186/s12911-018-0633-7
work_keys_str_mv AT rashmieabeysinghe queryconstraintbasedminingofassociationrulesforexploratoryanalysisofclinicaldatasetsinthenationalsleepresearchresource
AT licongcui queryconstraintbasedminingofassociationrulesforexploratoryanalysisofclinicaldatasetsinthenationalsleepresearchresource
_version_ 1725861165504921600