TY - JOUR
T1 - Approximately Counting Butterflies in Large Bipartite Graph Streams
AU - Li, Rundong
AU - Wang, Pinghui
AU - Jia, Peng
AU - Zhang, Xiangliang
AU - Zhao, Junzhou
AU - Tao, Jing
AU - Yuan, Ye
AU - Guan, Xiaohong
N1 - KAUST Repository Item: Exported on 2021-03-09
PY - 2021
Y1 - 2021
N2 - Bipartite graphs widely exist in real-world scenarios and model binary relations like host-website, author-paper, and user-product. In bipartite graphs, a butterfly (i.e., 2×2 bi-clique) is the smallest non-trivial cohesive structure and plays an important role in applications such as anomaly detection. Considerable efforts focus on counting butterflies in static bipartite graphs. However, they suffer from high time and space complexity when the bipartite graph of interest is given as a stream of edges. Although there are methods for approximately counting butterflies from bipartite graph streams, they suffer from either low accuracy or high time complexity. Therefore, it is still a challenge to accurately estimate butterfly counts from bipartite graph streams in a short time. To address this issue, we develop novel algorithms by exploiting the bipartite nature, which subtly integrates sampling and sketching techniques. We provide accurate estimators for butterfly counts and derive simple yet exact formulas for bounding their errors. We also conduct extensive experiments on a variety of real-world large bipartite graphs. Experimental results demonstrate that our algorithms are up to 20.0 times more accurate and up to 286.3 times faster than state-of-the-art methods under the same memory usage.
AB - Bipartite graphs widely exist in real-world scenarios and model binary relations like host-website, author-paper, and user-product. In bipartite graphs, a butterfly (i.e., 2×2 bi-clique) is the smallest non-trivial cohesive structure and plays an important role in applications such as anomaly detection. Considerable efforts focus on counting butterflies in static bipartite graphs. However, they suffer from high time and space complexity when the bipartite graph of interest is given as a stream of edges. Although there are methods for approximately counting butterflies from bipartite graph streams, they suffer from either low accuracy or high time complexity. Therefore, it is still a challenge to accurately estimate butterfly counts from bipartite graph streams in a short time. To address this issue, we develop novel algorithms by exploiting the bipartite nature, which subtly integrates sampling and sketching techniques. We provide accurate estimators for butterfly counts and derive simple yet exact formulas for bounding their errors. We also conduct extensive experiments on a variety of real-world large bipartite graphs. Experimental results demonstrate that our algorithms are up to 20.0 times more accurate and up to 286.3 times faster than state-of-the-art methods under the same memory usage.
UR - http://hdl.handle.net/10754/667814
UR - https://ieeexplore.ieee.org/document/9366975/
U2 - 10.1109/TKDE.2021.3062987
DO - 10.1109/TKDE.2021.3062987
M3 - Article
SN - 2326-3865
SP - 1
EP - 1
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
ER -