TY - JOUR
T1 - Trajectory similarity join in spatial networks
AU - Shang, Shuo
AU - Chen, Lisi
AU - Wei, Zhewei
AU - Jensen, Christian S.
AU - Zheng, Kai
AU - Kalnis, Panos
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: This work is partially supported by KAUST, the National Natural Science Foundation of China (61402532, 61532018), Beijing Nova Program (xx2016078), and by the DiCyPS center, funded by Innovation Fund Denmark.
PY - 2017/9/7
Y1 - 2017/9/7
N2 - The matching of similar pairs of objects, called similarity join, is fundamental functionality in data management. We consider the case of trajectory similarity join (TS-Join), where the objects are trajectories of vehicles moving in road networks. Thus, given two sets of trajectories and a threshold θ, the TS-Join returns all pairs of trajectories from the two sets with similarity above θ. This join targets applications such as trajectory near-duplicate detection, data cleaning, ridesharing recommendation, and traffic congestion prediction.
With these applications in mind, we provide a purposeful definition of similarity. To enable efficient TS-Join processing on large sets of trajectories, we develop search space pruning techniques and take into account the parallel processing capabilities of modern processors. Specifically, we present a two-phase divide-and-conquer algorithm. For each trajectory, the algorithm first finds similar trajectories. Then it merges the results to achieve a final result. The algorithm exploits an upper bound on the spatiotemporal similarity and a heuristic scheduling strategy for search space pruning. The algorithm's per-trajectory searches are independent of each other and can be performed in parallel, and the merging has constant cost. An empirical study with real data offers insight in the performance of the algorithm and demonstrates that is capable of outperforming a well-designed baseline algorithm by an order of magnitude.
AB - The matching of similar pairs of objects, called similarity join, is fundamental functionality in data management. We consider the case of trajectory similarity join (TS-Join), where the objects are trajectories of vehicles moving in road networks. Thus, given two sets of trajectories and a threshold θ, the TS-Join returns all pairs of trajectories from the two sets with similarity above θ. This join targets applications such as trajectory near-duplicate detection, data cleaning, ridesharing recommendation, and traffic congestion prediction.
With these applications in mind, we provide a purposeful definition of similarity. To enable efficient TS-Join processing on large sets of trajectories, we develop search space pruning techniques and take into account the parallel processing capabilities of modern processors. Specifically, we present a two-phase divide-and-conquer algorithm. For each trajectory, the algorithm first finds similar trajectories. Then it merges the results to achieve a final result. The algorithm exploits an upper bound on the spatiotemporal similarity and a heuristic scheduling strategy for search space pruning. The algorithm's per-trajectory searches are independent of each other and can be performed in parallel, and the merging has constant cost. An empirical study with real data offers insight in the performance of the algorithm and demonstrates that is capable of outperforming a well-designed baseline algorithm by an order of magnitude.
UR - http://hdl.handle.net/10754/625506
UR - http://dl.acm.org/citation.cfm?doid=3137628.3137630
UR - http://www.scopus.com/inward/record.url?scp=85031432483&partnerID=8YFLogxK
U2 - 10.14778/3137628.3137630
DO - 10.14778/3137628.3137630
M3 - Article
SN - 2150-8097
VL - 10
SP - 1178
EP - 1189
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 11
ER -