Summary: | Active learning for networked data that focuses on predicting the labels of other nodes accurately by knowing the labels of a small subset of nodes is attracting more and more researchers because it is very useful especially in cases, where labeled data are expensive to obtain. However, most existing research either only apply to networks with assortative community structure or focus on node attribute data with links or are designed for working in single mode that will work at a higher learning and query cost than batch active learning in general. In view of this, in this paper, we propose a batch mode active learning method which uses information-theoretic techniques and random walk to select which nodes to label. The proposed method requires only network topology as its input, does not need to know the number of blocks in advance, and makes no initial assumptions about how the blocks connect. We test our method on two different types of networks: assortative structure and diassortative structure, and then compare our method with a single mode active learning method that is similar to our method except for working in single mode and several simple batch mode active learning methods using information-theoretic techniques and simple heuristics, such as employing degree or betweenness centrality. The experimental results show that the proposed method in this paper significantly outperforms them.
|