Summary: | 碩士 === 國立臺灣大學 === 電信工程學研究所 === 101 === Information diffusion and virus propagation are fundamental processes
often taking place in networks. The problem of devising a strategy to fa-
cilitate or block such process has received considerable attention. However,
a major challenge is that transmission pathways are often hidden. In other
words, one can only observe cascades, time stamps when nodes are infected
with events, but couldn’t know where and from whom nodes are infected.
Most researches dealing with the problem assume an underlying network
over which cascades spread. In real world, whether the transmission path-
ways of a contagion, a piece of information, emerges or not depends on many
factors such as the topic of the information and the time when the information
first are first mentioned. Political news, for example, spreads in a different
way from sports news. Political news itself spreads differently as time varies.
It spreads much faster when there is an election than usual. Therefore, it is
hard to model the diffusion processes by using only one single network when
information are of all kind.
In this thesis, we proposed an probabilistic generative mixture model that
models the generation of cascades, the time-stamps when the nodes mention
information. Our algorithm, MixCascades, could cluster similar cascades and
infer a corresponding underlying network for each cluster in the expectation-
maximization framework. Besides, our algorithm could determine the num-
ber of clusters automatically. In both synthetic and real cascade data, we
show that our algorithm could cluster cascades and recover the underlying
networks very effectively.
|