Summary: | 碩士 === 國立臺灣大學 === 資訊管理學研究所 === 102 === Along with the development of social network and the sustainable user growth, the explosion of contents provides tons of information. In order to efficiently and effectively classify tweets, users of Twitter can make use of hashtags to mark and categorize their tweets. However, most of the tweets do not contain hashtags. In addition, our research shows that there are only 15% of tweets contain hashtags, which greatly reduce the value of hashtags. Therefore, our research aims to develop a hashtag recommendation system to automatically provide hashtags according to the content of the tweet.
Our research mode is constructed based on Mixed Membership Model. We further extend the model by incorporating the temporal clustering effect and propose the result model, Topics over Time Multiple Channel Latent Dirichlet Allocation (TOT-MCLDA). The insight of our model is that the text words and hashtags from one tweet have the same latent topic condition factors. In addition, tweets posted in the same period of time have higher relevance. Hence, we can make use of the tweet contents to recommend hashtags by its latent topics. Experimental results on a 3-year Twitter dataset demonstrate that the proposed method can outperform some state-of-the-art methods.
|