Floating-Point Sparse Matrix-Vector Multiply for FPGAs

<p>Large, high density FPGAs with high local distributed memory bandwidth surpass the peak floating-point performance of high-end, general-purpose processors. Microprocessors do not deliver near their peak floating-point performance on efficient algorithms that use the Sparse Matrix-Vector Mul...

Full description

Bibliographic Details
Main Author: deLorimier, Michael John
Format: Others
Published: 2005
Online Access:https://thesis.library.caltech.edu/1776/1/smvm_thesis.pdf
deLorimier, Michael John (2005) Floating-Point Sparse Matrix-Vector Multiply for FPGAs. Master's thesis, California Institute of Technology. doi:10.7907/FCCD-FA51. https://resolver.caltech.edu/CaltechETD:etd-05132005-144347 <https://resolver.caltech.edu/CaltechETD:etd-05132005-144347>
id ndltd-CALTECH-oai-thesis.library.caltech.edu-1776
record_format oai_dc
spelling ndltd-CALTECH-oai-thesis.library.caltech.edu-17762020-05-08T03:04:55Z Floating-Point Sparse Matrix-Vector Multiply for FPGAs deLorimier, Michael John <p>Large, high density FPGAs with high local distributed memory bandwidth surpass the peak floating-point performance of high-end, general-purpose processors. Microprocessors do not deliver near their peak floating-point performance on efficient algorithms that use the Sparse Matrix-Vector Multiply (SMVM) kernel. In fact, microprocessors rarely achieve 33% of their peak floating-point performance when computing SMVM. We develop and analyze a scalable SMVM implementation on modern FPGAs and show that it can sustain high throughput, near peak, floating-point performance. Our implementation consists of logic design as well as scheduling and data placement techniques. For benchmark matrices from the Matrix Market Suite we project 1.5 double precision Gflops/FPGA for a single VirtexII-6000-4 and 12 double precision Gflops for 16 Virtex IIs (750 Mflops/FPGA). We also analyze the asymptotic efficiency of our architecture as parallelism scales using a constant rent-parameter matrix model. This demonstrates that our data placement techniques provide an asymptotic scaling benefit.</p> <p>While FPGA performance is attractive, higher performance is possible if we re-balance the hardware resources in FPGAs with embedded memories. We show that sacrificing half the logic area for memory area rarely degrades performance and improves performance for large matrices, by up to 5 times. We also 0 the performance effect of adding custom floating-point using a simple area model to preserve total chip area. Sacrificing logic for memory and custom floating-point units increases single FPGA performance to 5 double precision Gflops.</p> 2005 Thesis NonPeerReviewed application/pdf https://thesis.library.caltech.edu/1776/1/smvm_thesis.pdf https://resolver.caltech.edu/CaltechETD:etd-05132005-144347 deLorimier, Michael John (2005) Floating-Point Sparse Matrix-Vector Multiply for FPGAs. Master's thesis, California Institute of Technology. doi:10.7907/FCCD-FA51. https://resolver.caltech.edu/CaltechETD:etd-05132005-144347 <https://resolver.caltech.edu/CaltechETD:etd-05132005-144347> https://thesis.library.caltech.edu/1776/
collection NDLTD
format Others
sources NDLTD
description <p>Large, high density FPGAs with high local distributed memory bandwidth surpass the peak floating-point performance of high-end, general-purpose processors. Microprocessors do not deliver near their peak floating-point performance on efficient algorithms that use the Sparse Matrix-Vector Multiply (SMVM) kernel. In fact, microprocessors rarely achieve 33% of their peak floating-point performance when computing SMVM. We develop and analyze a scalable SMVM implementation on modern FPGAs and show that it can sustain high throughput, near peak, floating-point performance. Our implementation consists of logic design as well as scheduling and data placement techniques. For benchmark matrices from the Matrix Market Suite we project 1.5 double precision Gflops/FPGA for a single VirtexII-6000-4 and 12 double precision Gflops for 16 Virtex IIs (750 Mflops/FPGA). We also analyze the asymptotic efficiency of our architecture as parallelism scales using a constant rent-parameter matrix model. This demonstrates that our data placement techniques provide an asymptotic scaling benefit.</p> <p>While FPGA performance is attractive, higher performance is possible if we re-balance the hardware resources in FPGAs with embedded memories. We show that sacrificing half the logic area for memory area rarely degrades performance and improves performance for large matrices, by up to 5 times. We also 0 the performance effect of adding custom floating-point using a simple area model to preserve total chip area. Sacrificing logic for memory and custom floating-point units increases single FPGA performance to 5 double precision Gflops.</p>
author deLorimier, Michael John
spellingShingle deLorimier, Michael John
Floating-Point Sparse Matrix-Vector Multiply for FPGAs
author_facet deLorimier, Michael John
author_sort deLorimier, Michael John
title Floating-Point Sparse Matrix-Vector Multiply for FPGAs
title_short Floating-Point Sparse Matrix-Vector Multiply for FPGAs
title_full Floating-Point Sparse Matrix-Vector Multiply for FPGAs
title_fullStr Floating-Point Sparse Matrix-Vector Multiply for FPGAs
title_full_unstemmed Floating-Point Sparse Matrix-Vector Multiply for FPGAs
title_sort floating-point sparse matrix-vector multiply for fpgas
publishDate 2005
url https://thesis.library.caltech.edu/1776/1/smvm_thesis.pdf
deLorimier, Michael John (2005) Floating-Point Sparse Matrix-Vector Multiply for FPGAs. Master's thesis, California Institute of Technology. doi:10.7907/FCCD-FA51. https://resolver.caltech.edu/CaltechETD:etd-05132005-144347 <https://resolver.caltech.edu/CaltechETD:etd-05132005-144347>
work_keys_str_mv AT delorimiermichaeljohn floatingpointsparsematrixvectormultiplyforfpgas
_version_ 1719314562690318336