Summary: | This paper presents a novel meta-algorithm, Partition-Merge (PM), which takes existing centralized algorithms for graph computation and makes them distributed and faster. In a nutshell, PM divides the graph into small subgraphs using our novel randomized partitioning scheme, runs the centralized algorithm on each partition separately, and then <italic>stitches</italic> the resulting solutions to produce a global solution. We demonstrate the efficiency of the PM algorithm on two popular problems: computation of Maximum A Posteriori (MAP) assignment in an arbitrary pairwise Markov Random Field (MRF) and modularity optimization for community detection. We show that the resulting distributed algorithms for these problems become fast, which run in time linear in the number of nodes in the graph. Furthermore, PM leads to performance comparable – or even better – to that of the centralized algorithms as long as the graph has polynomial growth property. More precisely, if the centralized algorithm is a <inline-formula> <tex-math notation="LaTeX">$\mathcal {C}-$ </tex-math></inline-formula>factor approximation with constant <inline-formula> <tex-math notation="LaTeX">$\mathcal {C}\ge 1$ </tex-math></inline-formula>, the resulting distributed algorithm is a <inline-formula> <tex-math notation="LaTeX">$(\mathcal {C}+\delta)$ </tex-math></inline-formula>-factor approximation for any small <inline-formula> <tex-math notation="LaTeX">$\delta >0$ </tex-math></inline-formula>; and even if the centralized algorithm is a non-constant (e.g., logarithmic) factor approximation, then the resulting distributed algorithm becomes a constant factor approximation. For general graphs, we compute explicit bounds on the loss of performance of the resulting distributed algorithm with respect to the centralized algorithm. To show the efficiency of our algorithm, we conducted extensive experiments both on real-world networks and on synthetic networks. The experiments demonstrate that the PM algorithm provides a good trade-off between accuracy and running time.
|