CREATE: Clinical Record Analysis Technology Ensemble

In this thesis, we describe an approach that won a psychiatric symptom severity prediction challenge. The challenge was to correctly predict the severity of psychiatric symptoms on a 4-point scale. Our winning submission uses a novel stacked machine learning architecture in which (i) a base data ing...

Full description

Bibliographic Details
Main Author: Eglowski, Skylar
Format: Others
Published: DigitalCommons@CalPoly 2017
Subjects:
Online Access:https://digitalcommons.calpoly.edu/theses/1771
https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=2994&context=theses
id ndltd-CALPOLY-oai-digitalcommons.calpoly.edu-theses-2994
record_format oai_dc
spelling ndltd-CALPOLY-oai-digitalcommons.calpoly.edu-theses-29942021-08-31T05:02:20Z CREATE: Clinical Record Analysis Technology Ensemble Eglowski, Skylar In this thesis, we describe an approach that won a psychiatric symptom severity prediction challenge. The challenge was to correctly predict the severity of psychiatric symptoms on a 4-point scale. Our winning submission uses a novel stacked machine learning architecture in which (i) a base data ingestion/cleaning step was followed by the (ii) derivation of a base set of features defined using text analytics, after which (iii) association rule learning was used in a novel way to generate new features, followed by a (iv) feature selection step to eliminate irrelevant features, followed by a (v) classifier training algorithm in which a total of 22 classifiers including new classifier variants of AdaBoost and RandomForest were trained on seven different data views, and (vi) finally an ensemble learning step, in which ensembles of best learners were used to improve on the accuracy of individual learners. All of this was tested via standard 10-fold cross-validation on training data provided by the N-GRID challenge organizers, of which the three best ensembles were selected for submission to N-GRID's blind testing. The best of our submitted solutions garnered an overall final score of 0.863 according to the organizer's measure. All 3 of our submissions placed within the top 10 out of the 65 total submissions. The challenge constituted Track 2 of the 2016 Centers of Excellence in Genomic Science (CEGS) Neuropsychiatric Genome-Scale and RDOC Individualized Domains (N-GRID) Shared Task in Clinical Natural Language Processing. 2017-06-01T07:00:00Z text application/pdf https://digitalcommons.calpoly.edu/theses/1771 https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=2994&context=theses Master's Theses DigitalCommons@CalPoly clinical data analysis natural language processing N-GRID challenge
collection NDLTD
format Others
sources NDLTD
topic clinical data analysis
natural language processing
N-GRID challenge
spellingShingle clinical data analysis
natural language processing
N-GRID challenge
Eglowski, Skylar
CREATE: Clinical Record Analysis Technology Ensemble
description In this thesis, we describe an approach that won a psychiatric symptom severity prediction challenge. The challenge was to correctly predict the severity of psychiatric symptoms on a 4-point scale. Our winning submission uses a novel stacked machine learning architecture in which (i) a base data ingestion/cleaning step was followed by the (ii) derivation of a base set of features defined using text analytics, after which (iii) association rule learning was used in a novel way to generate new features, followed by a (iv) feature selection step to eliminate irrelevant features, followed by a (v) classifier training algorithm in which a total of 22 classifiers including new classifier variants of AdaBoost and RandomForest were trained on seven different data views, and (vi) finally an ensemble learning step, in which ensembles of best learners were used to improve on the accuracy of individual learners. All of this was tested via standard 10-fold cross-validation on training data provided by the N-GRID challenge organizers, of which the three best ensembles were selected for submission to N-GRID's blind testing. The best of our submitted solutions garnered an overall final score of 0.863 according to the organizer's measure. All 3 of our submissions placed within the top 10 out of the 65 total submissions. The challenge constituted Track 2 of the 2016 Centers of Excellence in Genomic Science (CEGS) Neuropsychiatric Genome-Scale and RDOC Individualized Domains (N-GRID) Shared Task in Clinical Natural Language Processing.
author Eglowski, Skylar
author_facet Eglowski, Skylar
author_sort Eglowski, Skylar
title CREATE: Clinical Record Analysis Technology Ensemble
title_short CREATE: Clinical Record Analysis Technology Ensemble
title_full CREATE: Clinical Record Analysis Technology Ensemble
title_fullStr CREATE: Clinical Record Analysis Technology Ensemble
title_full_unstemmed CREATE: Clinical Record Analysis Technology Ensemble
title_sort create: clinical record analysis technology ensemble
publisher DigitalCommons@CalPoly
publishDate 2017
url https://digitalcommons.calpoly.edu/theses/1771
https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=2994&context=theses
work_keys_str_mv AT eglowskiskylar createclinicalrecordanalysistechnologyensemble
_version_ 1719473025314717696