Abstract
We present a scalable design for accelerating the problem of solving a dense linear system of equations using LU Decomposition. A novel systolic array architecture that can be used as a building block in scientific applications is described and prototyped on a Xilinx Virtex 6 FPGA. This solver has a throughput of around 3.2 million linear systems per second for matrices of size N=4 and around 80 thousand linear systems per second for matrices of size N=16. In comparison with similar work, our design offers up to a 12-fold improvement in speed whilst requiring up to 50% less hardware resources. As a result, a linear system of size N=64 can be implemented on a single FPGA, whereas previous work was limited to a size of N=12 and resorted to complex multi-FPGA architectures to scale. Finally, the scalable design can be adapted to different sized problems with minimum effort. © 2014 IEEE.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 186-187 |
Number of pages | 2 |
ISBN (Print) | 9781479936090 |
DOIs | |
State | Published - Jan 1 2014 |
Externally published | Yes |