TY - GEN
T1 - Cluster-Based Subscription Matching for Geo-Textual Data Streams
AU - Chen, Lisi
AU - Shang, Shuo
AU - Zheng, Kai
AU - Kalnis, Panos
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: This work is supported in part by grants awarded by National Natural Science Foundation of Chine ( NSFC) (No.61832017, 61836007, 61532018)
PY - 2019/4
Y1 - 2019/4
N2 - Geo-textual data that contain spatial, textual, and temporal information are being generated at a very high rate. These geo-textual data cover a wide range of topics. Users may be interested in receiving local popular topics from geo-textual messages. We study the cluster-based subscription matching (CSM) problem. Given a stream of geo-textual messages, we maintain up-to-date clustering results based on a threshold-based online clustering algorithm. Based on the clustering result, we feed subscribers with their preferred geo-textual message clusters according to their specified keywords and location. Moreover, we summarize each cluster by selecting a set of representative messages. The CSM problem considers spatial proximity, textual relevance, and message freshness during the clustering, cluster feeding, and summarization processes. To solve the CSM problem, we propose a novel solution to cluster, feed, and summarize a stream of geo-textual messages efficiently. We evaluate the efficiency of our solution on two real-world datasets and the experimental results demonstrate that our solution is capable of high efficiency compared with baselines.
AB - Geo-textual data that contain spatial, textual, and temporal information are being generated at a very high rate. These geo-textual data cover a wide range of topics. Users may be interested in receiving local popular topics from geo-textual messages. We study the cluster-based subscription matching (CSM) problem. Given a stream of geo-textual messages, we maintain up-to-date clustering results based on a threshold-based online clustering algorithm. Based on the clustering result, we feed subscribers with their preferred geo-textual message clusters according to their specified keywords and location. Moreover, we summarize each cluster by selecting a set of representative messages. The CSM problem considers spatial proximity, textual relevance, and message freshness during the clustering, cluster feeding, and summarization processes. To solve the CSM problem, we propose a novel solution to cluster, feed, and summarize a stream of geo-textual messages efficiently. We evaluate the efficiency of our solution on two real-world datasets and the experimental results demonstrate that our solution is capable of high efficiency compared with baselines.
UR - http://hdl.handle.net/10754/656131
UR - https://ieeexplore.ieee.org/document/8731608/
UR - http://www.scopus.com/inward/record.url?scp=85067952675&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2019.00084
DO - 10.1109/ICDE.2019.00084
M3 - Conference contribution
SN - 9781538674741
SP - 890
EP - 901
BT - 2019 IEEE 35th International Conference on Data Engineering (ICDE)
PB - Institute of Electrical and Electronics Engineers (IEEE)
ER -