Piecing together the puzzle: Improving event content coverage for real-time sub-event detection using adaptive microblog crawling.

In an age when people are predisposed to report real-world events through their social media accounts, many researchers value the benefits of mining user generated content from social media. Compared with the traditional news media, social media services, such as Twitter, can provide more complete a...

Full description

Bibliographic Details
Main Authors: Laurissa Tokarchuk, Xinyue Wang, Stefan Poslad
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2017-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC5673163?pdf=render
Description
Summary:In an age when people are predisposed to report real-world events through their social media accounts, many researchers value the benefits of mining user generated content from social media. Compared with the traditional news media, social media services, such as Twitter, can provide more complete and timely information about the real-world events. However events are often like a puzzle and in order to solve the puzzle/understand the event, we must identify all the sub-events or pieces. Existing Twitter event monitoring systems for sub-event detection and summarization currently typically analyse events based on partial data as conventional data collection methodologies are unable to collect comprehensive event data. This results in existing systems often being unable to report sub-events in real-time and often in completely missing sub-events or pieces in the broader event puzzle. This paper proposes a Sub-event detection by real-TIme Microblog monitoring (STRIM) framework that leverages the temporal feature of an expanded set of news-worthy event content. In order to more comprehensively and accurately identify sub-events this framework first proposes the use of adaptive microblog crawling. Our adaptive microblog crawler is capable of increasing the coverage of events while minimizing the amount of non-relevant content. We then propose a stream division methodology that can be accomplished in real time so that the temporal features of the expanded event streams can be analysed by a burst detection algorithm. In the final steps of the framework, the content features are extracted from each divided stream and recombined to provide a final summarization of the sub-events. The proposed framework is evaluated against traditional event detection using event recall and event precision metrics. Results show that improving the quality and coverage of event contents contribute to better event detection by identifying additional valid sub-events. The novel combination of our proposed adaptive crawler and our stream division/recombination technique provides significant gains in event recall (44.44%) and event precision (9.57%). The addition of these sub-events or pieces, allows us to get closer to solving the event puzzle.
ISSN:1932-6203