CREATE: Clinical Record Analysis Technology Ensemble
In this thesis, we describe an approach that won a psychiatric symptom severity prediction challenge. The challenge was to correctly predict the severity of psychiatric symptoms on a 4-point scale. Our winning submission uses a novel stacked machine learning architecture in which (i) a base data ing...
Main Author: | |
---|---|
Format: | Others |
Published: |
DigitalCommons@CalPoly
2017
|
Subjects: | |
Online Access: | https://digitalcommons.calpoly.edu/theses/1771 https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=2994&context=theses |
id |
ndltd-CALPOLY-oai-digitalcommons.calpoly.edu-theses-2994 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-CALPOLY-oai-digitalcommons.calpoly.edu-theses-29942021-08-31T05:02:20Z CREATE: Clinical Record Analysis Technology Ensemble Eglowski, Skylar In this thesis, we describe an approach that won a psychiatric symptom severity prediction challenge. The challenge was to correctly predict the severity of psychiatric symptoms on a 4-point scale. Our winning submission uses a novel stacked machine learning architecture in which (i) a base data ingestion/cleaning step was followed by the (ii) derivation of a base set of features defined using text analytics, after which (iii) association rule learning was used in a novel way to generate new features, followed by a (iv) feature selection step to eliminate irrelevant features, followed by a (v) classifier training algorithm in which a total of 22 classifiers including new classifier variants of AdaBoost and RandomForest were trained on seven different data views, and (vi) finally an ensemble learning step, in which ensembles of best learners were used to improve on the accuracy of individual learners. All of this was tested via standard 10-fold cross-validation on training data provided by the N-GRID challenge organizers, of which the three best ensembles were selected for submission to N-GRID's blind testing. The best of our submitted solutions garnered an overall final score of 0.863 according to the organizer's measure. All 3 of our submissions placed within the top 10 out of the 65 total submissions. The challenge constituted Track 2 of the 2016 Centers of Excellence in Genomic Science (CEGS) Neuropsychiatric Genome-Scale and RDOC Individualized Domains (N-GRID) Shared Task in Clinical Natural Language Processing. 2017-06-01T07:00:00Z text application/pdf https://digitalcommons.calpoly.edu/theses/1771 https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=2994&context=theses Master's Theses DigitalCommons@CalPoly clinical data analysis natural language processing N-GRID challenge |
collection |
NDLTD |
format |
Others
|
sources |
NDLTD |
topic |
clinical data analysis natural language processing N-GRID challenge |
spellingShingle |
clinical data analysis natural language processing N-GRID challenge Eglowski, Skylar CREATE: Clinical Record Analysis Technology Ensemble |
description |
In this thesis, we describe an approach that won a psychiatric symptom severity prediction challenge. The challenge was to correctly predict the severity of psychiatric symptoms on a 4-point scale. Our winning submission uses a novel stacked machine learning architecture in which (i) a base data ingestion/cleaning step was followed by the (ii) derivation of a base set of features defined using text analytics, after which (iii) association rule learning was used in a novel way to generate new features, followed by a (iv) feature selection step to eliminate irrelevant features, followed by a (v) classifier training algorithm in which a total of 22 classifiers including new classifier variants of AdaBoost and RandomForest were trained on seven different data views, and (vi) finally an ensemble learning step, in which ensembles of best learners were used to improve on the accuracy of individual learners. All of this was tested via standard 10-fold cross-validation on training data provided by the N-GRID challenge organizers, of which the three best ensembles were selected for submission to N-GRID's blind testing. The best of our submitted solutions garnered an overall final score of 0.863 according to the organizer's measure. All 3 of our submissions placed within the top 10 out of the 65 total submissions. The challenge constituted Track 2 of the 2016 Centers of Excellence in Genomic Science (CEGS) Neuropsychiatric Genome-Scale and RDOC Individualized Domains (N-GRID) Shared Task in Clinical Natural Language Processing. |
author |
Eglowski, Skylar |
author_facet |
Eglowski, Skylar |
author_sort |
Eglowski, Skylar |
title |
CREATE: Clinical Record Analysis Technology Ensemble |
title_short |
CREATE: Clinical Record Analysis Technology Ensemble |
title_full |
CREATE: Clinical Record Analysis Technology Ensemble |
title_fullStr |
CREATE: Clinical Record Analysis Technology Ensemble |
title_full_unstemmed |
CREATE: Clinical Record Analysis Technology Ensemble |
title_sort |
create: clinical record analysis technology ensemble |
publisher |
DigitalCommons@CalPoly |
publishDate |
2017 |
url |
https://digitalcommons.calpoly.edu/theses/1771 https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=2994&context=theses |
work_keys_str_mv |
AT eglowskiskylar createclinicalrecordanalysistechnologyensemble |
_version_ |
1719473025314717696 |