Automated Learning Of Health Behaviors Through Consumer Authored Natural Language Text

Traditional methods for collecting data in support of clinical research include prospectively collected surveys, retrospective analyses of existing medical records, and a combination of the two. Yet these approaches tend to focus on a medical-centric worldview and, as a result, provide only a partia...

Full description

Bibliographic Details
Main Author: Yin, Zhijun
Other Authors: Ching-Hua Chen
Format: Others
Language:en
Published: VANDERBILT 2018
Subjects:
Online Access:http://etd.library.vanderbilt.edu/available/etd-02112018-221351/
id ndltd-VANDERBILT-oai-VANDERBILTETD-etd-02112018-221351
record_format oai_dc
collection NDLTD
language en
format Others
sources NDLTD
topic Computer Science
spellingShingle Computer Science
Yin, Zhijun
Automated Learning Of Health Behaviors Through Consumer Authored Natural Language Text
description Traditional methods for collecting data in support of clinical research include prospectively collected surveys, retrospective analyses of existing medical records, and a combination of the two. Yet these approaches tend to focus on a medical-centric worldview and, as a result, provide only a partial view of a patient's life. As distributed systems, cloud services and mobile devices grow in sophistication and market penetration, large amounts of personal data are generated every day, particularly in online environments, where a range of aspects of their life are disclosed, including information related to one's health. This situation provides an opportunity for healthcare providers and biomedical researchers to learn about patients from their own voice and beyond traditional data sources. However, collecting, processing, and acting upon self-authored natural language text imposes challenges on automatically extracting health-related information, including, but not limited to, ambiguity in communication, noisy data, long exposition that contains many different types of health information, and high-dimensionality in predictive model interoperability. This dissertation applies a data-driven approach to investigate how self-authored information in three different online environments can be relied upon to learn about health-related behaviors. Specifically, this dissertation investigates three foundational questions. First, how do individuals disclose health status through a general social media platform (e.g., Twitter)? Second, can patients' long-term treatment adherence be inferred through online health communities (e.g., forums in breastcancer.org)? Third, how can we learn patients' needs based on the messages they send to healthcare providers over a patient portal that is connected to an electronic medical record (EMR) system that is ingrained in the everyday functions of a large academic medical center? To process consumer-authored natural language text, this dissertation illustrates how to combine text mining, machine learning, and statistical inference to 1) extract health related events (e.g., adherence status), 2) create interpretable factors (e.g., semantic groups), 3) build efficient predicting models (e.g., predicting medication interruption events), and 4) learn meaningful health-related associations (e.g., semantics and health status disclosure, emotions and portray of adherence status, topics and medication adherence). It is shown that many factors communicated through self-authored text (e.g., emotions, personalities, and other factors that are not captured in structured EMRs) can be applied to explain an individual's health-related behavior. This research provides evidence that self-generated information can be applied to supplement traditional data sources to facilitate healthcare research.
author2 Ching-Hua Chen
author_facet Ching-Hua Chen
Yin, Zhijun
author Yin, Zhijun
author_sort Yin, Zhijun
title Automated Learning Of Health Behaviors Through Consumer Authored Natural Language Text
title_short Automated Learning Of Health Behaviors Through Consumer Authored Natural Language Text
title_full Automated Learning Of Health Behaviors Through Consumer Authored Natural Language Text
title_fullStr Automated Learning Of Health Behaviors Through Consumer Authored Natural Language Text
title_full_unstemmed Automated Learning Of Health Behaviors Through Consumer Authored Natural Language Text
title_sort automated learning of health behaviors through consumer authored natural language text
publisher VANDERBILT
publishDate 2018
url http://etd.library.vanderbilt.edu/available/etd-02112018-221351/
work_keys_str_mv AT yinzhijun automatedlearningofhealthbehaviorsthroughconsumerauthorednaturallanguagetext
_version_ 1718614390384623616
spelling ndltd-VANDERBILT-oai-VANDERBILTETD-etd-02112018-2213512018-02-14T05:14:02Z Automated Learning Of Health Behaviors Through Consumer Authored Natural Language Text Yin, Zhijun Computer Science Traditional methods for collecting data in support of clinical research include prospectively collected surveys, retrospective analyses of existing medical records, and a combination of the two. Yet these approaches tend to focus on a medical-centric worldview and, as a result, provide only a partial view of a patient's life. As distributed systems, cloud services and mobile devices grow in sophistication and market penetration, large amounts of personal data are generated every day, particularly in online environments, where a range of aspects of their life are disclosed, including information related to one's health. This situation provides an opportunity for healthcare providers and biomedical researchers to learn about patients from their own voice and beyond traditional data sources. However, collecting, processing, and acting upon self-authored natural language text imposes challenges on automatically extracting health-related information, including, but not limited to, ambiguity in communication, noisy data, long exposition that contains many different types of health information, and high-dimensionality in predictive model interoperability. This dissertation applies a data-driven approach to investigate how self-authored information in three different online environments can be relied upon to learn about health-related behaviors. Specifically, this dissertation investigates three foundational questions. First, how do individuals disclose health status through a general social media platform (e.g., Twitter)? Second, can patients' long-term treatment adherence be inferred through online health communities (e.g., forums in breastcancer.org)? Third, how can we learn patients' needs based on the messages they send to healthcare providers over a patient portal that is connected to an electronic medical record (EMR) system that is ingrained in the everyday functions of a large academic medical center? To process consumer-authored natural language text, this dissertation illustrates how to combine text mining, machine learning, and statistical inference to 1) extract health related events (e.g., adherence status), 2) create interpretable factors (e.g., semantic groups), 3) build efficient predicting models (e.g., predicting medication interruption events), and 4) learn meaningful health-related associations (e.g., semantics and health status disclosure, emotions and portray of adherence status, topics and medication adherence). It is shown that many factors communicated through self-authored text (e.g., emotions, personalities, and other factors that are not captured in structured EMRs) can be applied to explain an individual's health-related behavior. This research provides evidence that self-generated information can be applied to supplement traditional data sources to facilitate healthcare research. Ching-Hua Chen Daniel Fabbri Jeremy Warner Yevgeniy Vorobeychik Yuan Xue Bradley Malin VANDERBILT 2018-02-13 text application/pdf http://etd.library.vanderbilt.edu/available/etd-02112018-221351/ http://etd.library.vanderbilt.edu/available/etd-02112018-221351/ en restrictsix I hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to Vanderbilt University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.