Design and Implementation of a Twitter Data Collection and Management Service

碩士 === 國立政治大學 === 資訊科學學系 === 101 === The rise of social media, such as Twitter, has significantly influenced the mode of communication in modern society. By collecting, storing and analyzing the massive amount of user interaction data from social media, researchers can conduct more in-depth work in...

Full description

Bibliographic Details
Main Authors: Chou, Yu Chun, 周玉駿
Other Authors: Chen, Kung
Format: Others
Language:zh-TW
Online Access:http://ndltd.ncl.edu.tw/handle/76675057003629599118
id ndltd-TW-101NCCU5394033
record_format oai_dc
spelling ndltd-TW-101NCCU53940332016-09-25T04:04:25Z http://ndltd.ncl.edu.tw/handle/76675057003629599118 Design and Implementation of a Twitter Data Collection and Management Service 實作推特社群媒體的資料蒐集與管理服務 Chou, Yu Chun 周玉駿 碩士 國立政治大學 資訊科學學系 101 The rise of social media, such as Twitter, has significantly influenced the mode of communication in modern society. By collecting, storing and analyzing the massive amount of user interaction data from social media, researchers can conduct more in-depth work in many areas, such as disaster information dissemination (crisis informatics), trend analysis and social network analysis, etc. To help researchers focus on the analysis of data, it is necessary to construct a robust data collection and management platform. In this thesis, we investigate the issues and restrictions of current tweets data collection and storage, and present a modular design and implementation of tweet collection and management platform based on Twitter’s API. Two salient features of our platform are event-job based data collection tasks and access token pool. Specifically, researchers may lauch multiple job to collect the tweets related to an event with less duplicate tweets. By adopting the one job one access token approach, multiple jobs can run separately and will not affect the rate limit of each other. Besides, considering the common situation of tweet burst in many events, our platform first stores the collected data into HBase, a popular NoSQL system, and then quickly migrate them to a standard relational database. To evaluate our platform, we have conducted a few data collection experiments, and made a comparison with two other popular tweet collection tools, The preliminary results show that our platform has certain advantages over them. Chen, Kung 陳恭 學位論文 ; thesis 64 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立政治大學 === 資訊科學學系 === 101 === The rise of social media, such as Twitter, has significantly influenced the mode of communication in modern society. By collecting, storing and analyzing the massive amount of user interaction data from social media, researchers can conduct more in-depth work in many areas, such as disaster information dissemination (crisis informatics), trend analysis and social network analysis, etc. To help researchers focus on the analysis of data, it is necessary to construct a robust data collection and management platform. In this thesis, we investigate the issues and restrictions of current tweets data collection and storage, and present a modular design and implementation of tweet collection and management platform based on Twitter’s API. Two salient features of our platform are event-job based data collection tasks and access token pool. Specifically, researchers may lauch multiple job to collect the tweets related to an event with less duplicate tweets. By adopting the one job one access token approach, multiple jobs can run separately and will not affect the rate limit of each other. Besides, considering the common situation of tweet burst in many events, our platform first stores the collected data into HBase, a popular NoSQL system, and then quickly migrate them to a standard relational database. To evaluate our platform, we have conducted a few data collection experiments, and made a comparison with two other popular tweet collection tools, The preliminary results show that our platform has certain advantages over them.
author2 Chen, Kung
author_facet Chen, Kung
Chou, Yu Chun
周玉駿
author Chou, Yu Chun
周玉駿
spellingShingle Chou, Yu Chun
周玉駿
Design and Implementation of a Twitter Data Collection and Management Service
author_sort Chou, Yu Chun
title Design and Implementation of a Twitter Data Collection and Management Service
title_short Design and Implementation of a Twitter Data Collection and Management Service
title_full Design and Implementation of a Twitter Data Collection and Management Service
title_fullStr Design and Implementation of a Twitter Data Collection and Management Service
title_full_unstemmed Design and Implementation of a Twitter Data Collection and Management Service
title_sort design and implementation of a twitter data collection and management service
url http://ndltd.ncl.edu.tw/handle/76675057003629599118
work_keys_str_mv AT chouyuchun designandimplementationofatwitterdatacollectionandmanagementservice
AT zhōuyùjùn designandimplementationofatwitterdatacollectionandmanagementservice
AT chouyuchun shízuòtuītèshèqúnméitǐdezīliàosōujíyǔguǎnlǐfúwù
AT zhōuyùjùn shízuòtuītèshèqúnméitǐdezīliàosōujíyǔguǎnlǐfúwù
_version_ 1718384755662127104