Extracting Events from Social Media and the Web
Main Author: | |
---|---|
Language: | English |
Published: |
The Ohio State University / OhioLINK
2020
|
Subjects: | |
Online Access: | http://rave.ohiolink.edu/etdc/view?acc_num=osu1606487414850247 |
id |
ndltd-OhioLink-oai-etd.ohiolink.edu-osu1606487414850247 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-OhioLink-oai-etd.ohiolink.edu-osu16064874148502472021-10-16T05:25:16Z Extracting Events from Social Media and the Web Zong, Shi Computer Science The last decade has witnessed a tremendous advance of technology and has led to an explosion of user-generated text, including the web and short informal texts in microblogs such as Twitter. It motivates the need for automatic text processing techniques to extract, aggregate and analyze this huge amount of information that no one could handle manually.In this thesis, we present three efforts of using computational linguistics approaches to extract events from social media and the web. For each study, we develop resources and models for extracting structured information from unstructured data. We then demonstrate the value of analyzing these extracted events. In the first study, we analyze the perceived severity of cybersecurity threats reported on social media. We build a sensor that could automatically scan tweets mentioning cybersecurity threats and evaluate the threats severity based on language used to describe them. Our experimental results show that our predicted severity scores are correlated with actual scores in National Vulnerability Database (NVD) from experts and can be used as an indicator for whether a threat will be exploitable in the wild. In the second study, we study people's linguistic behavior when they make predictions about future events. We extract people's predictions from geopolitical and financial domains and investigate a number of linguistic metrics over people’s justifications of their predictions. We further demonstrate the possibility of accurately predicting forecasting skills using a model that is based solely on language. In the third study, we present a corpus that could be used for automatically extracting COVID-19 related events from Twitter. Based on our newly manually annotated dataset, we build a semantic search system that allows users to search a variety of information by using different queries related to COVID-19, such as "Who tested positive that has close contact with Boris Johnson?" or "What are the cure methods that people think effective?" We believe this semantic search system could help address the information overload for professionals who want to stay on top of recent developments related to COVID-19. 2020 English text The Ohio State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=osu1606487414850247 http://rave.ohiolink.edu/etdc/view?acc_num=osu1606487414850247 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws. |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
topic |
Computer Science |
spellingShingle |
Computer Science Zong, Shi Extracting Events from Social Media and the Web |
author |
Zong, Shi |
author_facet |
Zong, Shi |
author_sort |
Zong, Shi |
title |
Extracting Events from Social Media and the Web |
title_short |
Extracting Events from Social Media and the Web |
title_full |
Extracting Events from Social Media and the Web |
title_fullStr |
Extracting Events from Social Media and the Web |
title_full_unstemmed |
Extracting Events from Social Media and the Web |
title_sort |
extracting events from social media and the web |
publisher |
The Ohio State University / OhioLINK |
publishDate |
2020 |
url |
http://rave.ohiolink.edu/etdc/view?acc_num=osu1606487414850247 |
work_keys_str_mv |
AT zongshi extractingeventsfromsocialmediaandtheweb |
_version_ |
1719489976177000448 |