Mining User-generated Content for Insights

The proliferation of social media, such as blogs, micro-blogs and social networks, has led to a plethora of readily available user-generated content. The latter offers a unique, uncensored window into emerging stories and events, ranging from politics and revolutions to product perception and the ze...

Full description

Bibliographic Details
Main Author: Angel, Albert-David
Other Authors: Koudas, Nick
Language:en_ca
Published: 2012
Subjects:
Online Access:http://hdl.handle.net/1807/32650
Description
Summary:The proliferation of social media, such as blogs, micro-blogs and social networks, has led to a plethora of readily available user-generated content. The latter offers a unique, uncensored window into emerging stories and events, ranging from politics and revolutions to product perception and the zeitgeist. Importantly, structured information is available for user-generated content, by dint of its metadata, or can be surfaced via recently commoditized information extraction tools. This wealth of information, in the form of real-world entities and facts mentioned in a document, author demographics, and so on, provides exciting opportunities for mining insights from this content. Capitalizing upon these, we develop Grapevine, an online system that distills information from the social media collective on a daily basis, and facilitates its interactive exploration. To further this goal, we address important research problems, which are also of independent interest. The sheer scale of the data being processed, necessitates that our solutions be highly efficient. We propose efficient techniques for mining important stories, on a per-user-demographic basis, based on named entity co-occurrences in user-generated content. Building upon these, we propose efficient techniques for identifying emerging stories as-they-happen, by identifying dense structures in an evolving entity graph. To facilitate the exploration of these stories, we propose efficient techniques for filtering them, based on users’ textual descriptions of the entities involved. These gathered insights need to be presented to users in a useful manner, via a diverse set of representative documents; we thus propose efficient techniques for addressing this problem. Recommending related stories to users is important for navigation purposes. As the way in which these are related to the story being explored is not always clear, we propose efficient techniques for generating recommendation explanations via entity relatedness queries.