Cluster Cascades : Infer Multiple Information Networks UsingDiffusion Data

碩士 === 國立臺灣大學 === 電信工程學研究所 === 101 === Information diffusion and virus propagation are fundamental processes often taking place in networks. The problem of devising a strategy to fa- cilitate or block such process has received considerable attention. However, a major challenge is that transmission...

Full description

Bibliographic Details
Main Authors: Ming-Hao Yang, 楊明皓
Other Authors: 陳銘憲
Format: Others
Language:en_US
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/14697357916513369477
Description
Summary:碩士 === 國立臺灣大學 === 電信工程學研究所 === 101 === Information diffusion and virus propagation are fundamental processes often taking place in networks. The problem of devising a strategy to fa- cilitate or block such process has received considerable attention. However, a major challenge is that transmission pathways are often hidden. In other words, one can only observe cascades, time stamps when nodes are infected with events, but couldn’t know where and from whom nodes are infected. Most researches dealing with the problem assume an underlying network over which cascades spread. In real world, whether the transmission path- ways of a contagion, a piece of information, emerges or not depends on many factors such as the topic of the information and the time when the information first are first mentioned. Political news, for example, spreads in a different way from sports news. Political news itself spreads differently as time varies. It spreads much faster when there is an election than usual. Therefore, it is hard to model the diffusion processes by using only one single network when information are of all kind. In this thesis, we proposed an probabilistic generative mixture model that models the generation of cascades, the time-stamps when the nodes mention information. Our algorithm, MixCascades, could cluster similar cascades and infer a corresponding underlying network for each cluster in the expectation- maximization framework. Besides, our algorithm could determine the num- ber of clusters automatically. In both synthetic and real cascade data, we show that our algorithm could cluster cascades and recover the underlying networks very effectively.