Evaluating SPARQL queries on massive RDF datasets

Razen Al-Harbi, Ibrahim Abdelaziz, Panos Kalnis, Nikos Mamoulis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

23 Scopus citations

Abstract

Distributed RDF systems partition data across multiple computer nodes. Partitioning is typically based on heuristics that minimize inter-node communication and it is performed in an initial, data pre-processing phase. Therefore, the resulting partitions are static and do not adapt to changes in the query workload; as a result, existing systems are unable to consistently avoid communication for queries that are not favored by the initial data partitioning. Furthermore, for very large RDF knowledge bases, the partitioning phase becomes prohibitively expensive, leading to high startup costs. In this paper, we propose AdHash, a distributed RDF system which addresses the shortcomings of previous work. First, AdHash initially applies lightweight hash partitioning, which drastically minimizes the startup cost, while favoring the parallel processing of join patterns on subjects, without any data communication. Using a locality-aware planner, queries that cannot be processed in parallel are evaluated with minimal communication. Second, AdHash monitors the data access patterns and adapts dynamically to the query load by incrementally redistributing and replicating frequently accessed data. As a result, the communication cost for future queries is drastically reduced or even eliminated. Our experiments with synthetic and real data verify that AdHash (i) starts faster than all existing systems, (ii) processes thousands of queries before other systems become online, and (iii) gracefully adapts to the query load, being able to evaluate queries on billion-scale RDF data in sub-seconds. In this demonstration, audience can use a graphical interface of AdHash to verify its performance superiority compared to state-of-the-art distributed RDF systems.
Original languageEnglish (US)
Title of host publicationProceedings of the VLDB Endowment
PublisherVLDB Endowment
Pages1848-1851
Number of pages4
DOIs
StatePublished - Aug 1 2015

Fingerprint

Dive into the research topics of 'Evaluating SPARQL queries on massive RDF datasets'. Together they form a unique fingerprint.

Cite this