Lusail: A System for Querying Linked Data at Scale

Ibrahim Abdelazizu, Essam Mansouru, Mourad Ouzzaniu, Ashraf Aboulnagau, Panos Kalnisu

Research output: Contribution to journalArticlepeer-review

Abstract

The RDF data model allows publishing interlinked RDF datasets, where each dataset is independently maintained and is queryable via a SPARQL endpoint. Many applications would benefit from querying the resulting large, decentralized, geo-distributed graph through a federated SPARQL query processor. A crucial factor for good performance in federated query processing is pushing as much computation as possible to the local endpoints. Surprisingly, existing federated SPARQL engines are not effective at this task since they rely only on schema information. Consequently, they cause unnecessary data retrieval and communication, leading to poor scalability and response time. This paper addresses these limitations and presents Lusail, a scalable and efficient federated SPARQL system for querying large RDF graphs that are geo-distributed on different endpoints. Lusail uses a novel query rewriting algorithm to push computation to the local endpoints by relying on information about the RDF instances and not only the schema. The query rewriting algorithm has the additional advantage of exposing parallelism in query processing, which Lusail exploits through advanced scheduling at query run time. Our experiments on billions of triples of real and synthetic data show that Lusail outperforms state-of-the-art systems by orders of magnitude in terms of scalability and response time.
Original languageEnglish (US)
Pages (from-to)485-498
Number of pages14
JournalPROCEEDINGS OF THE VLDB ENDOWMENT
Volume11
Issue number4
DOIs
StatePublished - Dec 1 2017

Fingerprint

Dive into the research topics of 'Lusail: A System for Querying Linked Data at Scale'. Together they form a unique fingerprint.

Cite this