Hashtag Recommendation of Streaming Short Text by Topic Model Enhanced Semi-Supervised Learning

碩士 === 國立成功大學 === 資訊工程學系 === 103 === With the rapidly growing of real-time social media, like Twitter, many users can share and discuss their interest topics through such platforms. Hashtag is a type of metadata tag which allows users to annotate their topics of tweets. For research usage, for examp...

Full description

Bibliographic Details
Main Authors: Ji-DeChen, 陳吉德
Other Authors: Hung-Yu Kao
Format: Others
Language:en_US
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/05422098219246891375
Description
Summary:碩士 === 國立成功大學 === 資訊工程學系 === 103 === With the rapidly growing of real-time social media, like Twitter, many users can share and discuss their interest topics through such platforms. Hashtag is a type of metadata tag which allows users to annotate their topics of tweets. For research usage, for example, hashtags can help the performance of event detection by observing the trend of hashtags. Although Twitter grows rapidly, hashtag growth is not as expected. Our dataset shows that there are less than 20% of all tweets containing hashtags. It is caused by that most users may have no idea what hashtags are suitable for tweets they posted. If we can recommend suitable hashtags to users, it can be one of the solutions to solve the problem of low usage rate of hashtag. Hashtag recommendation belongs to the supervised learning problem. Providing more labeled data to train the model can get the higher performance in the prediction task. However, the labeled data in hashtag recommendation is not so much due to the low usage rate of hashtag. Thus, to address this problem, we want to exploit unlabeled data, i.e., non-hashtag tweets. Non-hashtag tweets will be self-labeled with virtual hashtags by the topic model and be used to extend training data. However, directly adding all non-hashtag tweets may not be helpful to train the model because there must be some noisy data. To overcome this issue, we apply the weight-updating mechanisms to filter out the useless parts of non-hashtag tweets which may not have any appropriate hashtags. These mechanisms also have to consider the temporal characteristics of hashtag due to the real-time nature of Twitter. The experimental results in this research show that adding effective non-hashtag tweets to extend original training data outperforms baseline methods which only exploit labeled data to train the model.