Design and Implementation of a Twitter Data Collection and Management Service
碩士 === 國立政治大學 === 資訊科學學系 === 101 === The rise of social media, such as Twitter, has significantly influenced the mode of communication in modern society. By collecting, storing and analyzing the massive amount of user interaction data from social media, researchers can conduct more in-depth work in...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Online Access: | http://ndltd.ncl.edu.tw/handle/76675057003629599118 |
id |
ndltd-TW-101NCCU5394033 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-101NCCU53940332016-09-25T04:04:25Z http://ndltd.ncl.edu.tw/handle/76675057003629599118 Design and Implementation of a Twitter Data Collection and Management Service 實作推特社群媒體的資料蒐集與管理服務 Chou, Yu Chun 周玉駿 碩士 國立政治大學 資訊科學學系 101 The rise of social media, such as Twitter, has significantly influenced the mode of communication in modern society. By collecting, storing and analyzing the massive amount of user interaction data from social media, researchers can conduct more in-depth work in many areas, such as disaster information dissemination (crisis informatics), trend analysis and social network analysis, etc. To help researchers focus on the analysis of data, it is necessary to construct a robust data collection and management platform. In this thesis, we investigate the issues and restrictions of current tweets data collection and storage, and present a modular design and implementation of tweet collection and management platform based on Twitter’s API. Two salient features of our platform are event-job based data collection tasks and access token pool. Specifically, researchers may lauch multiple job to collect the tweets related to an event with less duplicate tweets. By adopting the one job one access token approach, multiple jobs can run separately and will not affect the rate limit of each other. Besides, considering the common situation of tweet burst in many events, our platform first stores the collected data into HBase, a popular NoSQL system, and then quickly migrate them to a standard relational database. To evaluate our platform, we have conducted a few data collection experiments, and made a comparison with two other popular tweet collection tools, The preliminary results show that our platform has certain advantages over them. Chen, Kung 陳恭 學位論文 ; thesis 64 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立政治大學 === 資訊科學學系 === 101 === The rise of social media, such as Twitter, has significantly influenced the mode of communication in modern society. By collecting, storing and analyzing the massive amount of user interaction data from social media, researchers can conduct more in-depth work in many areas, such as disaster information dissemination (crisis informatics), trend analysis and social network analysis, etc. To help researchers focus on the analysis of data, it is necessary to construct a robust data collection and management platform.
In this thesis, we investigate the issues and restrictions of current tweets data collection and storage, and present a modular design and implementation of tweet collection and management platform based on Twitter’s API. Two salient features of our platform are event-job based data collection tasks and access token pool. Specifically, researchers may lauch multiple job to collect the tweets related to an event with less duplicate tweets. By adopting the one job one access token approach, multiple jobs can run separately and will not affect the rate limit of each other. Besides, considering the common situation of tweet burst in many events, our platform first stores the collected data into HBase, a popular NoSQL system, and then quickly migrate them to a standard relational database.
To evaluate our platform, we have conducted a few data collection experiments, and made a comparison with two other popular tweet collection tools, The preliminary results show that our platform has certain advantages over them.
|
author2 |
Chen, Kung |
author_facet |
Chen, Kung Chou, Yu Chun 周玉駿 |
author |
Chou, Yu Chun 周玉駿 |
spellingShingle |
Chou, Yu Chun 周玉駿 Design and Implementation of a Twitter Data Collection and Management Service |
author_sort |
Chou, Yu Chun |
title |
Design and Implementation of a Twitter Data Collection and Management Service |
title_short |
Design and Implementation of a Twitter Data Collection and Management Service |
title_full |
Design and Implementation of a Twitter Data Collection and Management Service |
title_fullStr |
Design and Implementation of a Twitter Data Collection and Management Service |
title_full_unstemmed |
Design and Implementation of a Twitter Data Collection and Management Service |
title_sort |
design and implementation of a twitter data collection and management service |
url |
http://ndltd.ncl.edu.tw/handle/76675057003629599118 |
work_keys_str_mv |
AT chouyuchun designandimplementationofatwitterdatacollectionandmanagementservice AT zhōuyùjùn designandimplementationofatwitterdatacollectionandmanagementservice AT chouyuchun shízuòtuītèshèqúnméitǐdezīliàosōujíyǔguǎnlǐfúwù AT zhōuyùjùn shízuòtuītèshèqúnméitǐdezīliàosōujíyǔguǎnlǐfúwù |
_version_ |
1718384755662127104 |