TY - GEN
T1 - Matrix Algebra Framework for Portable, Scalable and Efficient Query Engines for RDF Graphs
AU - Jamour, Fuad Tarek
AU - Abdelaziz, Ibrahim
AU - Chen, Yuanzhao
AU - Kalnis, Panos
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: We thank the anonymous reviewers and our paper’s shepherd Jens Teubner for their feedback. We also thank the authors of Wukong [48] for their responsiveness and help with running and understanding their system. For supercomputer time, this research used the resources of the Supercomputing Laboratory at King Abdullah University of Science & Technology (KAUST) in Thuwal, Saudi Arabia.
PY - 2019/3/22
Y1 - 2019/3/22
N2 - Existing query engines for RDF graphs follow one of two design paradigms: relational or graph-based. We explore sparse matrix algebra as a third paradigm and propose MAGiQ: a framework for implementing SPARQL query engines that are portable on various hardware architectures, scalable over thousands of compute nodes, and efficient for very large RDF datasets. MAGiQ represents the RDF graph as a sparse matrix and defines a domain-specific language of algebraic operations. SPARQL queries are translated into matrix algebra programs that are oblivious to the underlying computing infrastructure. Existing matrix algebra libraries, optimized for each particular architecture, are called to execute the program and handle the performance issues. We present three case studies of matrix algebra back-end libraries: SuiteSparse, Matlab, and CombBLAS; we demonstrate how MAGiQ can effortlessly be ported on a variety of architectures such as Intel CPUs, NVIDIA GPUs, and Cray XC40 supercomputers. Our experiments on large-scale real and synthetic datasets show that MAGiQ performs comparably to or better than existing specialized SPARQL query engines for data-intensive queries, scales to very large computing infrastructures, and handles datasets with up to 512 billion triples.
AB - Existing query engines for RDF graphs follow one of two design paradigms: relational or graph-based. We explore sparse matrix algebra as a third paradigm and propose MAGiQ: a framework for implementing SPARQL query engines that are portable on various hardware architectures, scalable over thousands of compute nodes, and efficient for very large RDF datasets. MAGiQ represents the RDF graph as a sparse matrix and defines a domain-specific language of algebraic operations. SPARQL queries are translated into matrix algebra programs that are oblivious to the underlying computing infrastructure. Existing matrix algebra libraries, optimized for each particular architecture, are called to execute the program and handle the performance issues. We present three case studies of matrix algebra back-end libraries: SuiteSparse, Matlab, and CombBLAS; we demonstrate how MAGiQ can effortlessly be ported on a variety of architectures such as Intel CPUs, NVIDIA GPUs, and Cray XC40 supercomputers. Our experiments on large-scale real and synthetic datasets show that MAGiQ performs comparably to or better than existing specialized SPARQL query engines for data-intensive queries, scales to very large computing infrastructures, and handles datasets with up to 512 billion triples.
UR - http://hdl.handle.net/10754/652464
UR - https://dl.acm.org/citation.cfm?doid=3302424.3303962
UR - http://www.scopus.com/inward/record.url?scp=85063897376&partnerID=8YFLogxK
U2 - 10.1145/3302424.3303962
DO - 10.1145/3302424.3303962
M3 - Conference contribution
SN - 9781450362818
BT - Proceedings of the Fourteenth EuroSys Conference 2019 CD-ROM on ZZZ - EuroSys '19
PB - Association for Computing Machinery (ACM)
ER -