TY - JOUR
T1 - Accurately Estimating User Cardinalities and Detecting Super Spreaders over Time
AU - Jia, Peng
AU - Wang, Pinghui
AU - Zhang, Yuchao
AU - Zhang, Xiangliang
AU - Tao, Jing
AU - Ding, Jianwei
AU - Guan, Xiaohong
AU - Towsley, Don
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: The research presented in this paper is supported in part by National Key R&D Program of China (2018YFC0830500), National Natural Science Foundation of China (U1736205, 61603290), Shenzhen Basic Research Grant (JCYJ20170816100819428), Natural Science Basic Research Plan in Shaanxi Province of China (2016JQ6034).
PY - 2020
Y1 - 2020
N2 - Online monitoring user cardinalities in graph streams is fundamental for many applications such as anomaly detection. These graph streams may contain edge duplicates and have a large number of user-item pairs, which makes it infeasible to exactly compute user cardinalities due to limited computational and memory resources. Existing methods are designed to approximately estimate user cardinalities, but their accuracy highly depends on complex parameters and they cannot provide anytime-available estimation. To address these problems, we develop novel bit/register sharing algorithms, which use a bit/register array to build a compact sketch of all users' connected items. Our algorithms exploit the dynamic properties of the bit/register arrays (e.g., the fraction of zero bits in the bit array) to significantly improve the estimation accuracy, and have low time complexity O(1) to update the estimations for a new user-item pair. In addition, our algorithms are simple and easy to use, without requirements to tune any parameter. Furthermore, we extend our methods to detect super spreaders with large cardinalities in real-time. We evaluate the performance of our methods on real-world datasets. The experimental results demonstrate that our methods are several times more accurate and faster than state-of-the-art methods using the same amount of memory.
AB - Online monitoring user cardinalities in graph streams is fundamental for many applications such as anomaly detection. These graph streams may contain edge duplicates and have a large number of user-item pairs, which makes it infeasible to exactly compute user cardinalities due to limited computational and memory resources. Existing methods are designed to approximately estimate user cardinalities, but their accuracy highly depends on complex parameters and they cannot provide anytime-available estimation. To address these problems, we develop novel bit/register sharing algorithms, which use a bit/register array to build a compact sketch of all users' connected items. Our algorithms exploit the dynamic properties of the bit/register arrays (e.g., the fraction of zero bits in the bit array) to significantly improve the estimation accuracy, and have low time complexity O(1) to update the estimations for a new user-item pair. In addition, our algorithms are simple and easy to use, without requirements to tune any parameter. Furthermore, we extend our methods to detect super spreaders with large cardinalities in real-time. We evaluate the performance of our methods on real-world datasets. The experimental results demonstrate that our methods are several times more accurate and faster than state-of-the-art methods using the same amount of memory.
UR - http://hdl.handle.net/10754/661680
UR - https://ieeexplore.ieee.org/document/9007395/
U2 - 10.1109/TKDE.2020.2975625
DO - 10.1109/TKDE.2020.2975625
M3 - Article
SN - 1041-4347
SP - 1
EP - 1
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
ER -