結合中文斷詞系統與雙分群演算法於音樂相關臉書粉絲團之分析:以KKBOX為例

近年智慧型手機與網路的普及,使得社群網站與線上串流音樂蓬勃發展。臉書(Facebook)用戶截至去年止每月總體平均用戶高達18.6億人 ,粉絲專頁成為公司企業特別關注的行銷手段。粉絲專頁上的貼文能夠在短時間內經過點閱、分享傳播至用戶的頁面,達到比起電視廣告更佳的效果,也節省了許多的成本。本研究提供了一套針對臉書粉絲專頁貼文的分群流程,考量到貼文字詞的複雜性,除了抓取了臉書粉絲專頁的貼文外,也抓取了與其相關的KKBOX網頁資訊,整合KKBOX網頁中的資料,對中文斷詞系統(Jieba)的語料庫進行擴充,以提高斷詞的正確性,接著透過雙分群演算法(Minimum Squared Residue Co-...

Full description

Bibliographic Details
Main Authors: 陳柏羽, Chen, Po Yu
Language:中文
Published: 國立政治大學
Subjects:
Online Access:http://thesis.lib.nccu.edu.tw/cgi-bin/cdrfb3/gsweb.cgi?o=dstdcdr&i=sid=%22G0102753012%22.
Description
Summary:近年智慧型手機與網路的普及,使得社群網站與線上串流音樂蓬勃發展。臉書(Facebook)用戶截至去年止每月總體平均用戶高達18.6億人 ,粉絲專頁成為公司企業特別關注的行銷手段。粉絲專頁上的貼文能夠在短時間內經過點閱、分享傳播至用戶的頁面,達到比起電視廣告更佳的效果,也節省了許多的成本。本研究提供了一套針對臉書粉絲專頁貼文的分群流程,考量到貼文字詞的複雜性,除了抓取了臉書粉絲專頁的貼文外,也抓取了與其相關的KKBOX網頁資訊,整合KKBOX網頁中的資料,對中文斷詞系統(Jieba)的語料庫進行擴充,以提高斷詞的正確性,接著透過雙分群演算法(Minimum Squared Residue Co-Clustering Algorithm)對貼文進行分群,並利用鑑別率(Discrimination Rate)與凝聚率(Agglomerate Rate)配合主成份分析(Principal Component Analysis)所產生的分佈圖來對分群結果進行評估,選出較佳的分群結果進一步去分析,進而找出分類的根據。在結果中,發現本研究的方法能夠有效的區分出不同類型的貼文,甚至能夠依據使用字詞、語法或編排格式的不同來進行分群。 === In recent years, because both smartphones and the Internet have become more popular, social network sites and music streaming services have grown vigorously. The monthly average of Facebook users hit 1.86 billion last years and Facebook Fan Page has become a popular marketing tool. Posts on Facebook can be broadcasted to millions of people in a short period of time by LIKEing and SHAREing pages. Using Facebook Fan Page as a marketing tool is more effective than advertising on television and can definitely reduce the costs. This study presents a process to cluster posts on Facebook Fan Page. Considering the complicated word usage, we grasped information on Facebook Fan Page and related information on the KKBOX website. First, we integrated the information on the website of KKBOX and expanded the text corpus of Jibea to enhance the accuracy of word segmentation. Then, we clustered the posts into several groups through Minimum Squared Residue Co-Clustering Algorithm and used discrimination Rate and Agglomerate Rate to analyze the distribution chart of Principal Component Analysis. After that, we found the suitable classification and could further analyze it. How posts are classified can then be found. As a result, we found that the method of this study can effectively cluster different kinds of posts and even cluster these posts according to its words, syntax and arrangement.