Microblog Hot Spot Mining Based on PAM Probabilistic Topic Model

Microblogs are short texts carried with limited information, which will increase the difficulty of topic mining. This paper proposes the use of PAM (Pachinko Allocation Model) probabilistic topic model to extract the generative model of text’s implicit theme for microblog hot spot mining. First, thr...

Full description

Bibliographic Details
Main Authors: Zheng Yaxin, Ling Liu
Format: Article
Language:English
Published: EDP Sciences 2015-01-01
Series:MATEC Web of Conferences
Subjects:
Online Access:http://dx.doi.org/10.1051/matecconf/20152201062
id doaj-f14644b56c0c42f6b164b7a656a1b98f
record_format Article
spelling doaj-f14644b56c0c42f6b164b7a656a1b98f2021-02-02T08:22:28ZengEDP SciencesMATEC Web of Conferences2261-236X2015-01-01220106210.1051/matecconf/20152201062matecconf_iceta2015_01062Microblog Hot Spot Mining Based on PAM Probabilistic Topic ModelZheng YaxinLing LiuMicroblogs are short texts carried with limited information, which will increase the difficulty of topic mining. This paper proposes the use of PAM (Pachinko Allocation Model) probabilistic topic model to extract the generative model of text’s implicit theme for microblog hot spot mining. First, three categories of microblog and the main contribution of this paper are illustrated. Second, for there are four topic models which are respectively explained, the PAM model is introduced in detail in terms of how to generate a document, the accuracy of document classification and the topic correlation in PAM. Finally, MapReduce is described. For the number of microblogs is huge as well as the number of contactors, the totally number of words is relatively small. With MapReduce, microblogs data are split by contactor, document-topic count matrix and contactor-topic count matrix can be locally stored while the word-topic count matrix must be globally stored. Thus, the hot spot mining can be achieved on the basis of PAM probabilistic topic model.http://dx.doi.org/10.1051/matecconf/20152201062microbloghot spotPAM probabilistic topic modelMapReduce
collection DOAJ
language English
format Article
sources DOAJ
author Zheng Yaxin
Ling Liu
spellingShingle Zheng Yaxin
Ling Liu
Microblog Hot Spot Mining Based on PAM Probabilistic Topic Model
MATEC Web of Conferences
microblog
hot spot
PAM probabilistic topic model
MapReduce
author_facet Zheng Yaxin
Ling Liu
author_sort Zheng Yaxin
title Microblog Hot Spot Mining Based on PAM Probabilistic Topic Model
title_short Microblog Hot Spot Mining Based on PAM Probabilistic Topic Model
title_full Microblog Hot Spot Mining Based on PAM Probabilistic Topic Model
title_fullStr Microblog Hot Spot Mining Based on PAM Probabilistic Topic Model
title_full_unstemmed Microblog Hot Spot Mining Based on PAM Probabilistic Topic Model
title_sort microblog hot spot mining based on pam probabilistic topic model
publisher EDP Sciences
series MATEC Web of Conferences
issn 2261-236X
publishDate 2015-01-01
description Microblogs are short texts carried with limited information, which will increase the difficulty of topic mining. This paper proposes the use of PAM (Pachinko Allocation Model) probabilistic topic model to extract the generative model of text’s implicit theme for microblog hot spot mining. First, three categories of microblog and the main contribution of this paper are illustrated. Second, for there are four topic models which are respectively explained, the PAM model is introduced in detail in terms of how to generate a document, the accuracy of document classification and the topic correlation in PAM. Finally, MapReduce is described. For the number of microblogs is huge as well as the number of contactors, the totally number of words is relatively small. With MapReduce, microblogs data are split by contactor, document-topic count matrix and contactor-topic count matrix can be locally stored while the word-topic count matrix must be globally stored. Thus, the hot spot mining can be achieved on the basis of PAM probabilistic topic model.
topic microblog
hot spot
PAM probabilistic topic model
MapReduce
url http://dx.doi.org/10.1051/matecconf/20152201062
work_keys_str_mv AT zhengyaxin microbloghotspotminingbasedonpamprobabilistictopicmodel
AT lingliu microbloghotspotminingbasedonpamprobabilistictopicmodel
_version_ 1724297318718504960