Dense matrix computations on NUMA architectures with distance-aware work stealing

Rabab Al-Omairy, Guillermo Miranda, Hatem Ltaief, Rosa M. Badia, Xavier Martorell, Jesus Labarta, David Keyes

Research output: Contribution to journalArticlepeer-review

18 Scopus citations


We employ the dynamic runtime system OmpSs to decrease the overhead of data motion in the now ubiquitous non-uniform memory access (NUMA) high concurrency environment of multicore processors. The dense numerical linear algebra algorithms of Cholesky factorization and symmetric matrix inversion are employed as representative benchmarks. Work stealing occurs within an innovative NUMA-aware scheduling policy to reduce data movement between NUMA nodes. The overall approach achieves separation of concerns by abstracting the complexity of the hardware from the end users so that high productivity can be achieved. Performance results on a large NUMA system outperform the state-of-the-art existing implementations up to a twofold speedup for the Cholesky factorization, as well as the symmetric matrix inversion, while the OmpSs-enabled code maintains strong similarity to its original sequential version.

Original languageEnglish (US)
Pages (from-to)49-72
Number of pages24
JournalSupercomputing Frontiers and Innovations
Issue number1
StatePublished - 2015


  • Data locality
  • Dense matrix computations
  • Dynamic runtime systems
  • High performance computing
  • Non-uniform memory access
  • Software productivity
  • Work stealing

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture
  • Computer Science Applications
  • Computer Networks and Communications
  • Computational Theory and Mathematics


Dive into the research topics of 'Dense matrix computations on NUMA architectures with distance-aware work stealing'. Together they form a unique fingerprint.

Cite this