Health Information Extraction from Social Media

abstract: Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks such as pharmacovigilance via the use of Natural Language Processing (NLP) techniques. One of the critical steps...

Full description

Bibliographic Details
Other Authors:	Nikfarjam, Azadeh (Author)
Format:	Doctoral Thesis
Language:	English
Published:	2016
Subjects:	Bioinformatics Artificial intelligence Public health Deep Learning Information Extraction Machine Learning Natural Language Processing Pharmacovigilance Social Media Mining
Online Access:	http://hdl.handle.net/2286/R.I.40354

id	ndltd-asu.edu-item-40354
record_format	oai_dc
spelling	ndltd-asu.edu-item-403542018-06-22T03:07:52Z Health Information Extraction from Social Media abstract: Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks such as pharmacovigilance via the use of Natural Language Processing (NLP) techniques. One of the critical steps in information extraction pipelines is Named Entity Recognition (NER), where the mentions of entities such as diseases are located in text and their entity type are identified. However, the language in social media is highly informal, and user-expressed health-related concepts are often non-technical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and advanced machine learning-based NLP techniques have been underutilized. This work explores the effectiveness of different machine learning techniques, and particularly deep learning, to address the challenges associated with extraction of health-related concepts from social media. Deep learning has recently attracted a lot of attention in machine learning research and has shown remarkable success in several applications particularly imaging and speech recognition. However, thus far, deep learning techniques are relatively unexplored for biomedical text mining and, in particular, this is the first attempt in applying deep learning for health information extraction from social media. This work presents ADRMine that uses a Conditional Random Field (CRF) sequence tagger for extraction of complex health-related concepts. It utilizes a large volume of unlabeled user posts for automatic learning of embedding cluster features, a novel application of deep learning in modeling the similarity between the tokens. ADRMine significantly improved the medical NER performance compared to the baseline systems. This work also presents DeepHealthMiner, a deep learning pipeline for health-related concept extraction. Most of the machine learning methods require sophisticated task-specific manual feature design which is a challenging step in processing the informal and noisy content of social media. DeepHealthMiner automatically learns classification features using neural networks and utilizing a large volume of unlabeled user posts. Using a relatively small labeled training set, DeepHealthMiner could accurately identify most of the concepts, including the consumer expressions that were not observed in the training data or in the standard medical lexicons outperforming the state-of-the-art baseline techniques. Dissertation/Thesis Nikfarjam, Azadeh (Author) Gonzalez, Graciela (Advisor) Greenes, Robert (Committee member) Scotch, Matthew (Committee member) Arizona State University (Publisher) Bioinformatics Artificial intelligence Public health Deep Learning Information Extraction Machine Learning Natural Language Processing Pharmacovigilance Social Media Mining eng 105 pages Doctoral Dissertation Biomedical Informatics 2016 Doctoral Dissertation http://hdl.handle.net/2286/R.I.40354 http://rightsstatements.org/vocab/InC/1.0/ All Rights Reserved 2016
collection	NDLTD
language	English
format	Doctoral Thesis
sources	NDLTD
topic	Bioinformatics Artificial intelligence Public health Deep Learning Information Extraction Machine Learning Natural Language Processing Pharmacovigilance Social Media Mining
spellingShingle	Bioinformatics Artificial intelligence Public health Deep Learning Information Extraction Machine Learning Natural Language Processing Pharmacovigilance Social Media Mining Health Information Extraction from Social Media
description	abstract: Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks such as pharmacovigilance via the use of Natural Language Processing (NLP) techniques. One of the critical steps in information extraction pipelines is Named Entity Recognition (NER), where the mentions of entities such as diseases are located in text and their entity type are identified. However, the language in social media is highly informal, and user-expressed health-related concepts are often non-technical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and advanced machine learning-based NLP techniques have been underutilized. This work explores the effectiveness of different machine learning techniques, and particularly deep learning, to address the challenges associated with extraction of health-related concepts from social media. Deep learning has recently attracted a lot of attention in machine learning research and has shown remarkable success in several applications particularly imaging and speech recognition. However, thus far, deep learning techniques are relatively unexplored for biomedical text mining and, in particular, this is the first attempt in applying deep learning for health information extraction from social media. This work presents ADRMine that uses a Conditional Random Field (CRF) sequence tagger for extraction of complex health-related concepts. It utilizes a large volume of unlabeled user posts for automatic learning of embedding cluster features, a novel application of deep learning in modeling the similarity between the tokens. ADRMine significantly improved the medical NER performance compared to the baseline systems. This work also presents DeepHealthMiner, a deep learning pipeline for health-related concept extraction. Most of the machine learning methods require sophisticated task-specific manual feature design which is a challenging step in processing the informal and noisy content of social media. DeepHealthMiner automatically learns classification features using neural networks and utilizing a large volume of unlabeled user posts. Using a relatively small labeled training set, DeepHealthMiner could accurately identify most of the concepts, including the consumer expressions that were not observed in the training data or in the standard medical lexicons outperforming the state-of-the-art baseline techniques. === Dissertation/Thesis === Doctoral Dissertation Biomedical Informatics 2016
author2	Nikfarjam, Azadeh (Author)
author_facet	Nikfarjam, Azadeh (Author)
title	Health Information Extraction from Social Media
title_short	Health Information Extraction from Social Media
title_full	Health Information Extraction from Social Media
title_fullStr	Health Information Extraction from Social Media
title_full_unstemmed	Health Information Extraction from Social Media
title_sort	health information extraction from social media
publishDate	2016
url	http://hdl.handle.net/2286/R.I.40354
_version_	1718701269250473984

Health Information Extraction from Social Media

Similar Items