ScaleMine: Scalable Parallel Frequent Subgraph Mining in a Single Large Graph: Scalable Parallel Frequent Subgraph Mining in a Single Large Graph

Ehab Abdelhamid, Ibrahim Abdelaziz, Panos Kalnis, Zuhair Khayyat, Fuad Tarek Jamour

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

72 Citations (SciVal)

Abstract

Frequent Subgraph Mining is an essential operation for graph analytics and knowledge extraction. Due to its high computational cost, parallel solutions are necessary. Existing approaches either suffer from load imbalance, or high communication and synchronization overheads. In this paper we propose ScaleMine; a novel parallel frequent subgraph mining system for a single large graph. ScaleMine introduces a novel two-phase approach. The first phase is approximate; it quickly identifies subgraphs that are frequent with high probability, while collecting various statistics. The second phase computes the exact solution by employing the results of the approximation to achieve good load balance; prune the search space; generate efficient execution plans; and guide intra-task parallelism. Our experiments show that ScaleMine scales to 8,192 cores on a Cray XC40 (12× more than competitors); supports graphs with one billion edges (10× larger than competitors), and is at least an order of magnitude faster than existing solutions.
Original languageEnglish (US)
Title of host publicationSC16: International Conference for High Performance Computing, Networking, Storage and Analysis
Subtitle of host publicationThe International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherIEEE Computer Society
Pages716-727
Number of pages12
ISBN (Electronic)9781467388153
ISBN (Print)9781467388153
DOIs
StatePublished - Mar 13 2017
Event2016 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016 - Salt Lake City, United States
Duration: Nov 13 2016Nov 18 2016

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Other

Other2016 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016
Country/TerritoryUnited States
CitySalt Lake City
Period11/13/1611/18/16

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Fingerprint

Dive into the research topics of 'ScaleMine: Scalable Parallel Frequent Subgraph Mining in a Single Large Graph: Scalable Parallel Frequent Subgraph Mining in a Single Large Graph'. Together they form a unique fingerprint.

Cite this