Summary: | Graphs are used in numerous applications to model real-world systems and phenomena. The ever increasing size of graphs makes them difficult to query and analyze. In this paper, we propose HcRPC, a Highly compact Reachability Preserving Graph compression algorithm with Corrections, which is capable of preserving the reachability relations between the nodes in original graph. The highly compressed representation of a given graph consists of a compressed graph and a set of corrections. The original graph is compressed on the basis of equivalence class obtained via the reachability relations between nodes in the original graph. In the compressed graph, each node corresponds to a set of nodes from the original graph with similar ancestors and descendants, and each edge represents linkage between the original nodes in any two node sets. The corrections portion specifies the set of corrections, including equivalent class-node corrections and node-node corrections. MinHash technique is utilized to speed up checking whether equivalence classes are structure-similar and the pair of equivalence classes with high similarity are thus merged to acquire a highly compressed graph. Besides, we develop an algorithm for preserving compressed graph with a set of corrections in response to changes to the original graph. We evaluate our algorithms on real-life graph data sets and the results indicate that graph data sets can be highly compressed while preserving the reachability relations between nodes.
|