CREATE: Clinical Record Analysis Technology Ensemble

In this thesis, we describe an approach that won a psychiatric symptom severity prediction challenge. The challenge was to correctly predict the severity of psychiatric symptoms on a 4-point scale. Our winning submission uses a novel stacked machine learning architecture in which (i) a base data ing...

Full description

Bibliographic Details
Main Author:	Eglowski, Skylar
Format:	Others
Published:	DigitalCommons@CalPoly 2017
Subjects:	clinical data analysis natural language processing N-GRID challenge
Online Access:	https://digitalcommons.calpoly.edu/theses/1771 https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=2994&context=theses

id	ndltd-CALPOLY-oai-digitalcommons.calpoly.edu-theses-2994
record_format	oai_dc
spelling	ndltd-CALPOLY-oai-digitalcommons.calpoly.edu-theses-29942021-08-31T05:02:20Z CREATE: Clinical Record Analysis Technology Ensemble Eglowski, Skylar In this thesis, we describe an approach that won a psychiatric symptom severity prediction challenge. The challenge was to correctly predict the severity of psychiatric symptoms on a 4-point scale. Our winning submission uses a novel stacked machine learning architecture in which (i) a base data ingestion/cleaning step was followed by the (ii) derivation of a base set of features defined using text analytics, after which (iii) association rule learning was used in a novel way to generate new features, followed by a (iv) feature selection step to eliminate irrelevant features, followed by a (v) classifier training algorithm in which a total of 22 classifiers including new classifier variants of AdaBoost and RandomForest were trained on seven different data views, and (vi) finally an ensemble learning step, in which ensembles of best learners were used to improve on the accuracy of individual learners. All of this was tested via standard 10-fold cross-validation on training data provided by the N-GRID challenge organizers, of which the three best ensembles were selected for submission to N-GRID's blind testing. The best of our submitted solutions garnered an overall final score of 0.863 according to the organizer's measure. All 3 of our submissions placed within the top 10 out of the 65 total submissions. The challenge constituted Track 2 of the 2016 Centers of Excellence in Genomic Science (CEGS) Neuropsychiatric Genome-Scale and RDOC Individualized Domains (N-GRID) Shared Task in Clinical Natural Language Processing. 2017-06-01T07:00:00Z text application/pdf https://digitalcommons.calpoly.edu/theses/1771 https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=2994&context=theses Master's Theses DigitalCommons@CalPoly clinical data analysis natural language processing N-GRID challenge
collection	NDLTD
format	Others
sources	NDLTD
topic	clinical data analysis natural language processing N-GRID challenge
spellingShingle	clinical data analysis natural language processing N-GRID challenge Eglowski, Skylar CREATE: Clinical Record Analysis Technology Ensemble
description	In this thesis, we describe an approach that won a psychiatric symptom severity prediction challenge. The challenge was to correctly predict the severity of psychiatric symptoms on a 4-point scale. Our winning submission uses a novel stacked machine learning architecture in which (i) a base data ingestion/cleaning step was followed by the (ii) derivation of a base set of features defined using text analytics, after which (iii) association rule learning was used in a novel way to generate new features, followed by a (iv) feature selection step to eliminate irrelevant features, followed by a (v) classifier training algorithm in which a total of 22 classifiers including new classifier variants of AdaBoost and RandomForest were trained on seven different data views, and (vi) finally an ensemble learning step, in which ensembles of best learners were used to improve on the accuracy of individual learners. All of this was tested via standard 10-fold cross-validation on training data provided by the N-GRID challenge organizers, of which the three best ensembles were selected for submission to N-GRID's blind testing. The best of our submitted solutions garnered an overall final score of 0.863 according to the organizer's measure. All 3 of our submissions placed within the top 10 out of the 65 total submissions. The challenge constituted Track 2 of the 2016 Centers of Excellence in Genomic Science (CEGS) Neuropsychiatric Genome-Scale and RDOC Individualized Domains (N-GRID) Shared Task in Clinical Natural Language Processing.
author	Eglowski, Skylar
author_facet	Eglowski, Skylar
author_sort	Eglowski, Skylar
title	CREATE: Clinical Record Analysis Technology Ensemble
title_short	CREATE: Clinical Record Analysis Technology Ensemble
title_full	CREATE: Clinical Record Analysis Technology Ensemble
title_fullStr	CREATE: Clinical Record Analysis Technology Ensemble
title_full_unstemmed	CREATE: Clinical Record Analysis Technology Ensemble
title_sort	create: clinical record analysis technology ensemble
publisher	DigitalCommons@CalPoly
publishDate	2017
url	https://digitalcommons.calpoly.edu/theses/1771 https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=2994&context=theses
work_keys_str_mv	AT eglowskiskylar createclinicalrecordanalysistechnologyensemble
_version_	1719473025314717696

CREATE: Clinical Record Analysis Technology Ensemble

Similar Items