Summary: | 博士 === 國立臺灣大學 === 電機工程學研究所 === 105 === Social media platforms have emerged as a powerful and real-time means of communication recently. People are using social media to share and
exchange information about any events, ranging from breaking news stories to natural disasters and information about local festivals. With the help of rapid development of mobile technologies, messages posted in social media can typically reflect these events as they happen. However, since the dramatic growth of the social media data, it becomes infeasible for users to read all posts or comments. Therefore, mining and summarizing rich user generated content in social media can present great opportunities for developing many potential applications (e.g., breaking news discovery, traffic monitoring, and natural disaster monitoring.) On the other hand, the celebrities, corporations, and organizations also set up social pages to interact with their fans and the public. Although it is important for them to understand how their fans and customers reacting to certain topics and content, the volume and the rapidly increment nature of social media make it time-consuming to get the overview of a comment stream. Therefore, in this dissertation, we first propose a significant URL mining approach (named SURLMINE) to rank the URL on social media based on various features. Note that URL is a global language without language dependency. It is also worthy to know that only 35\% of tweets on Twitter are posted in English. In other words, mining social media content through URL is able to involve more data from different languages. Most of all, it is efficient and there is no lost in translation. On the other hand, to summarize the comment stream, we propose a real-time incremental short
text summarization on comment streams (abbreviated as IncreSTS) to provide an at-a-glance presentation that users can easily and rapidly understand the main points of similar comments. Our experiments conducted on real datasets show that the SURLMINE can reach up to 92\% of precision based on YouTube datasets and the increSTS possesses the advantages of high efficiency, high scalability, and better handling outliers on the target problem.
|