Summary: | 碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 100 === Microblog users publish their opinions by using condensed text with some non-textual contents because of the limitation of content length. Moreover, user-generated content often includes chaotic messages, useless information or unrelated information to the theme of original post. Microblog posts and responses also contain Network Informal Language (NIL) such as abbreviations, misspelled and phonetic words and. In this paper, a novel approach of Maximum Discussion Group Detection (MDGD) from each post and its responses is proposed. Briefly, the MDGs with higher user participation degree are selected to extract the significant terms from unconventional expressions of microblog posts by modified NIL and Lexical Chain models. To enrich the fusion results, we refer the related contents from multiple microblog platforms according to the previous extracted terms.
In the experiments, we use test data set collected from the microblog platforms on Plurk and Facebook which includes the terms of “林書豪”, “馬英九” and “蔡英文”. Then, the NIL dictionary is constructed for ENIL model. Comparing with CKIP, the segmentation results indicate that the precision of ENIL improved 7.4% to 17.5% significantly. Finally, NDCG metrics is used to evaluate the user satisfactions of fusion results. The results of user satisfactions show that our system is capable to provide qualified fused results.
|