TY - JOUR
T1 - ITISS: an efficient framework for querying big temporal data
AU - Chen, Zhongpu
AU - Yao, Bin
AU - Wang, Zhi-Jie
AU - Zhang, Wei
AU - Zheng, Kai
AU - Kalnis, Panos
AU - Tang, Feilong
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: This work was supported by the NSFC (61872235, 61729202, 61832017, U1636210, U61811264, 61832013 and 61672351), and the National Key Research and Development Program of China (2018YFC1504504, 2016YFB0700502 and 2018YFB1004400).
PY - 2019/5/22
Y1 - 2019/5/22
N2 - In the real word, temporal data can be found in many applications, and it is rapidly increasing nowadays. It is urgently important and challenging to manage and operate big temporal data efficiently and effectively, due to the large volume of big temporal data and the real-time response requirement. Processing big temporal data using a distributed system is a desired choice, since a single-machine based system usually has the limited computing ability. Nevertheless, existing distributed systems or methods either are disk-based solutions, or cannot support native queries, which may not well meet the demands of low latency and high throughput. To attack these issues, this article suggests a new approach to handle big temporal data. Our approach is an In-memory based Two-level Index Solution in Spark, dubbed as ITISS. The proposed framework of our solution is easily understood and implemented, but without loss of effectiveness and efficiency. Based on the proposed framework, this article develops targeted algorithms for handling time travel, temporal aggregation, and temporal join queries, respectively. We have implemented our framework in Apache Spark, extended the Apache Spark SQL to support declarative SQL interface that enables users to perform temporal queries with a few lines of SQL statements, and conducted extensive experiments to verify the performance of our solution. The experimental results, based on both real and synthetic datasets, consistently demonstrate that our proposed solution is efficient and competitive for processing big temporal data.
AB - In the real word, temporal data can be found in many applications, and it is rapidly increasing nowadays. It is urgently important and challenging to manage and operate big temporal data efficiently and effectively, due to the large volume of big temporal data and the real-time response requirement. Processing big temporal data using a distributed system is a desired choice, since a single-machine based system usually has the limited computing ability. Nevertheless, existing distributed systems or methods either are disk-based solutions, or cannot support native queries, which may not well meet the demands of low latency and high throughput. To attack these issues, this article suggests a new approach to handle big temporal data. Our approach is an In-memory based Two-level Index Solution in Spark, dubbed as ITISS. The proposed framework of our solution is easily understood and implemented, but without loss of effectiveness and efficiency. Based on the proposed framework, this article develops targeted algorithms for handling time travel, temporal aggregation, and temporal join queries, respectively. We have implemented our framework in Apache Spark, extended the Apache Spark SQL to support declarative SQL interface that enables users to perform temporal queries with a few lines of SQL statements, and conducted extensive experiments to verify the performance of our solution. The experimental results, based on both real and synthetic datasets, consistently demonstrate that our proposed solution is efficient and competitive for processing big temporal data.
UR - http://hdl.handle.net/10754/656199
UR - http://link.springer.com/10.1007/s10707-019-00362-1
UR - http://www.scopus.com/inward/record.url?scp=85066905007&partnerID=8YFLogxK
U2 - 10.1007/s10707-019-00362-1
DO - 10.1007/s10707-019-00362-1
M3 - Article
SN - 1384-6175
VL - 24
SP - 27
EP - 59
JO - GeoInformatica
JF - GeoInformatica
IS - 1
ER -