Summary: | 碩士 === 國立交通大學 === 電機工程學系 === 106 === Network representation, embedding large information networks into low dimensional vector spaces, has been widely studied in homogeneous networks. Deriving the latent representations of the information networks can apply to data analysis methods such as visualizing the entire network, classifying nodes into their belonging classes, and detecting communities. Representation serves a crucial role in those data analyzing tasks. Heterogeneous networks, containing more hidden features not available in homogeneous networks, however, are less studied. One straightforward method is to view a heterogeneous network as a homogeneous one and obtain its representation using existing algorithms. Yet, data loss and computational inefficiency is the bottleneck of previous methods. Hence, we first use metapath to highlight those meaningful paths so that pairs of nodes close in networks would also be near in the representation space. Landmark selection, as a result of that nodes differ in the importance to representation learning, purposes to give the nodes a priority order. High priority nodes are provided with more chances to train their representations. Our landmark selection concentrates on the distribution of the starting nodes of each walk. We design degree centrality as the criteria to determine landmarks, which rank the nodes by the number of their linked edges. The effectiveness of both methods is testified through the multi-label classification results in terms of Micro-F1 and Macro-F1 score. Metapath demonstrates its strength over conventional homogeneous representation methods while landmark selection further promotes the benefits to an even higher level.
|