Cholesky Factorization of Tile Low Rank Matrices on GPUs

Wajih Boukaram*, Stefano Zampini, George Turkiyyah, David Keyes

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Tile low rank (TLR) representations of dense matrices partition them into blocks of roughly uniform size, where off-diagonal tiles are compressed and stored in low rank factorizations. They offer an attractive representation for many data-sparse dense operators that appear in practical applications, since substantial compression and a much smaller memory footprint can be achieved. Despite their utility, however, there are currently only a few high performance algorithms that can generate their Cholesky factorizations and operate on them efficiently, especially on GPUs. The difficulties in achieving high performance when factoring TLR matrices come from the expensive compression operations that must be performed during the factorization process and the irregular rank distribution of the tiles that requires an adaptive work pattern for the processing cores. In this work, we describe an algorithm that overcomes these limitations. Our algorithm has several new features. It always works in the compressed representation of the tiles. It compresses every tile in the output once only. It uses GEMM-rich adaptive randomized approximation for the compression. It also uses dynamic batched operations on the GPU to manage the irregular workload due to differing ranks among the output tiles. The resulting algorithm achieves substantial performance, as we demonstrate on sample matrices.

Original languageEnglish (US)
Title of host publication2024 SIAM Conference on Parallel Processing for Scientific Computing, PP 2024
EditorsMichael Bader, Anshu Dubey, Bethany Lusch
PublisherSociety for Industrial and Applied Mathematics Publications
Pages65-77
Number of pages13
ISBN (Electronic)9781713893479
StatePublished - 2024
Event22nd SIAM Conference on Parallel Processing for Scientific Computing, PP 2024 - Baltimore, United States
Duration: Mar 5 2024Mar 8 2024

Publication series

Name2024 SIAM Conference on Parallel Processing for Scientific Computing, PP 2024

Conference

Conference22nd SIAM Conference on Parallel Processing for Scientific Computing, PP 2024
Country/TerritoryUnited States
CityBaltimore
Period03/5/2403/8/24

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software
  • General Mathematics

Fingerprint

Dive into the research topics of 'Cholesky Factorization of Tile Low Rank Matrices on GPUs'. Together they form a unique fingerprint.

Cite this