TY - GEN
T1 - Utilizing Dynamic Properties of Sharing Bits and Registers to Estimate User Cardinalities Over Time
AU - Wang, Pinghui
AU - Jia, Peng
AU - Zhang, Xiangliang
AU - Tao, Jing
AU - Guan, Xiaohong
AU - Towsley, Don
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: The research presented in this paper is supported in part by National Key R&D Program of China (2018YFC0830500), National Natural Science Foundation of China (U1301254, 61603290, 61602371), the Ministry of Education&China Mobile Research Fund (MCM20160311), the Natural Science Foundation of Jiangsu Province (SBK2014021758), 111 International Collaboration Program of China, the Prospective Joint Research of Industry-Academia-Research Joint Innovation Funding of Jiangsu Province (BY2014074), Shenzhen Basic Research Grant (JCYJ20160229195940462, JCYJ20170816100819428), China Postdoctoral Science Foundation (2015M582663), Natural Science Basic Research Plan in Shaanxi Province of China (2016JQ6034).
PY - 2019/4
Y1 - 2019/4
N2 - Online monitoring user cardinalities (or degrees) in graph streams is fundamental for many applications. For example in a bipartite graph representing user-website visiting activities, user cardinalities (the number of distinct visited websites) are monitored to report network anomalies. These real-world graph streams may contain user-item duplicates and have a huge number of distinct user-item pairs, therefore, it is infeasible to exactly compute user cardinalities when memory and computation resources are limited. Existing methods are designed to approximately estimate user cardinalities, whose accuracy highly depends on parameters that are not easy to set. Moreover, these methods cannot provide anytime-available estimation, as the user cardinalities are computed at the end of the data stream. Realtime applications such as anomaly detection require that user cardinalities are estimated on the fly. To address these problems, we develop novel bit and register sharing algorithms, which use a bit array and a register array to build a compact sketch of all users' connected items respectively. Compared with previous bit and register sharing methods, our algorithms exploit the dynamic properties of the bit and register arrays (e.g., the fraction of zero bits in the bit array at each time) to significantly improve the estimation accuracy, and have low time complexity (O(1)) to update the estimations each time they observe a new useritem pair. In addition, our algorithms are simple and easy to use, without requirements to tune any parameter. We evaluate the performance of our methods on real-world datasets. The experimental results demonstrate that our methods are several times more accurate and faster than state-of-the-art methods using the same amount of memory.
AB - Online monitoring user cardinalities (or degrees) in graph streams is fundamental for many applications. For example in a bipartite graph representing user-website visiting activities, user cardinalities (the number of distinct visited websites) are monitored to report network anomalies. These real-world graph streams may contain user-item duplicates and have a huge number of distinct user-item pairs, therefore, it is infeasible to exactly compute user cardinalities when memory and computation resources are limited. Existing methods are designed to approximately estimate user cardinalities, whose accuracy highly depends on parameters that are not easy to set. Moreover, these methods cannot provide anytime-available estimation, as the user cardinalities are computed at the end of the data stream. Realtime applications such as anomaly detection require that user cardinalities are estimated on the fly. To address these problems, we develop novel bit and register sharing algorithms, which use a bit array and a register array to build a compact sketch of all users' connected items respectively. Compared with previous bit and register sharing methods, our algorithms exploit the dynamic properties of the bit and register arrays (e.g., the fraction of zero bits in the bit array at each time) to significantly improve the estimation accuracy, and have low time complexity (O(1)) to update the estimations each time they observe a new useritem pair. In addition, our algorithms are simple and easy to use, without requirements to tune any parameter. We evaluate the performance of our methods on real-world datasets. The experimental results demonstrate that our methods are several times more accurate and faster than state-of-the-art methods using the same amount of memory.
UR - http://hdl.handle.net/10754/655955
UR - https://ieeexplore.ieee.org/document/8731345/
UR - http://www.scopus.com/inward/record.url?scp=85067932395&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2019.00101
DO - 10.1109/ICDE.2019.00101
M3 - Conference contribution
SN - 9781538674741
SP - 1094
EP - 1105
BT - 2019 IEEE 35th International Conference on Data Engineering (ICDE)
PB - Institute of Electrical and Electronics Engineers (IEEE)
ER -