Twitter search : building a useful search engine

Millions of digital communications are posted over social media every day. Whilst some state that a large proportion of these posts are considered to be babble, we know that some of these posts actually contain useful information. In this thesis we specifically look at how we can identify reasons as...

Full description

Bibliographic Details
Main Author: Hurlock, Jonathan
Published: Swansea University 2015
Online Access:https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.752356
Description
Summary:Millions of digital communications are posted over social media every day. Whilst some state that a large proportion of these posts are considered to be babble, we know that some of these posts actually contain useful information. In this thesis we specifically look at how we can identify reasons as to what makes some of these communications useful or not useful to someone searching for information over social media. In particular we look at what makes messages (tweets) from the social network Twitter useful or not useful users performing search over a corpus of tweets. We identify 16 features that help a tweet be deemed useful, and 17 features as to why a tweet may be deemed not useful to someone performing a search task. From these findings we describe a distributed architecture we have compiled to process large datasets and allow us to perform search over a corpus of tweets. Utilizing this architecture we are able to index tweets based on our findings and describe a crowdsourcing study we ran to help optimize weightings for these features via learning to rank, which quantifies how important each feature is in understanding what makes tweets useful or not for common search tasks performed over twitter. We release a corpus of tweets for the purpose of evaluating other usefulness systems.