Accelerator for Flexible QR Decomposition and Back Substitution

abstract: QR decomposition (QRD) of a matrix is one of the most common linear algebra operationsused for the decomposition of a square/non-square matrix. It has a wide range of applications especially in Multiple Input-Multiple Output (MIMO) communication systems. Unfortunately it has high computati...

Full description

Bibliographic Details
Other Authors: Kanagala, Srimayee (Author)
Format: Dissertation
Language:English
Published: 2020
Subjects:
QRD
Online Access:http://hdl.handle.net/2286/R.I.63084
id ndltd-asu.edu-item-63084
record_format oai_dc
spelling ndltd-asu.edu-item-630842021-01-15T05:01:21Z Accelerator for Flexible QR Decomposition and Back Substitution abstract: QR decomposition (QRD) of a matrix is one of the most common linear algebra operationsused for the decomposition of a square/non-square matrix. It has a wide range of applications especially in Multiple Input-Multiple Output (MIMO) communication systems. Unfortunately it has high computation complexity { for matrix size of nxn, QRD has O(n3) complexity and back substitution, which is used to solve a system of linear equations, has O(n2) complexity. Thus, as the matrix size increases, the hardware resource requirement for QRD and back substitution increases signicantly. This thesis presents the design and implementation of a exible QRD and back substitution accelerator using a folded architecture. It can support matrix sizes of 4x4, 8x8, 12x12, 16x16, and 20x20 with low hardware resource requirement. The proposed architecture is based on the systolic array implementation of the Givens algorithm for QRD. It is built with three dierent types of computation blocks which are connected in a 2-D array structure. These blocks are controlled by a scheduler which facilitates reusability of the blocks to perform computation for any input matrix size which is a multiple of 4. These blocks are designed using two basic programming elements which support both the forward and backward paths to compute matrix R in QRD and column-matrix X in back substitution computation. The proposed architecture has been mapped to Xilinx Zynq Ultrascale+ FPGA (Field Programmable Gate Array), ZCU102. All inputs are complex with precision of 40 bits (38 fractional bits and 1 signed bit). The architecture can be clocked at 50 MHz. The synthesis results of the folded architecture for dierent matrix sizes are presented. The results show that the folded architecture can support QRD and back substitution for inputs of large sizes which otherwise cannot t on an FPGA when implemented using a at architecture. The memory sizes required for dierent matrix sizes are also presented. Dissertation/Thesis Kanagala, Srimayee (Author) Chakrabarti, Chaitali (Advisor) Bliss, Daniel (Committee member) Cao, Yu (Kevin) (Committee member) Arizona State University (Publisher) Computer engineering Back Substitution Flexible Architecture Folded Architecture FPGA Hardware Accelerator QRD eng 68 pages Masters Thesis Electrical Engineering 2020 Masters Thesis http://hdl.handle.net/2286/R.I.63084 http://rightsstatements.org/vocab/InC/1.0/ 2020
collection NDLTD
language English
format Dissertation
sources NDLTD
topic Computer engineering
Back Substitution
Flexible Architecture
Folded Architecture
FPGA
Hardware Accelerator
QRD
spellingShingle Computer engineering
Back Substitution
Flexible Architecture
Folded Architecture
FPGA
Hardware Accelerator
QRD
Accelerator for Flexible QR Decomposition and Back Substitution
description abstract: QR decomposition (QRD) of a matrix is one of the most common linear algebra operationsused for the decomposition of a square/non-square matrix. It has a wide range of applications especially in Multiple Input-Multiple Output (MIMO) communication systems. Unfortunately it has high computation complexity { for matrix size of nxn, QRD has O(n3) complexity and back substitution, which is used to solve a system of linear equations, has O(n2) complexity. Thus, as the matrix size increases, the hardware resource requirement for QRD and back substitution increases signicantly. This thesis presents the design and implementation of a exible QRD and back substitution accelerator using a folded architecture. It can support matrix sizes of 4x4, 8x8, 12x12, 16x16, and 20x20 with low hardware resource requirement. The proposed architecture is based on the systolic array implementation of the Givens algorithm for QRD. It is built with three dierent types of computation blocks which are connected in a 2-D array structure. These blocks are controlled by a scheduler which facilitates reusability of the blocks to perform computation for any input matrix size which is a multiple of 4. These blocks are designed using two basic programming elements which support both the forward and backward paths to compute matrix R in QRD and column-matrix X in back substitution computation. The proposed architecture has been mapped to Xilinx Zynq Ultrascale+ FPGA (Field Programmable Gate Array), ZCU102. All inputs are complex with precision of 40 bits (38 fractional bits and 1 signed bit). The architecture can be clocked at 50 MHz. The synthesis results of the folded architecture for dierent matrix sizes are presented. The results show that the folded architecture can support QRD and back substitution for inputs of large sizes which otherwise cannot t on an FPGA when implemented using a at architecture. The memory sizes required for dierent matrix sizes are also presented. === Dissertation/Thesis === Masters Thesis Electrical Engineering 2020
author2 Kanagala, Srimayee (Author)
author_facet Kanagala, Srimayee (Author)
title Accelerator for Flexible QR Decomposition and Back Substitution
title_short Accelerator for Flexible QR Decomposition and Back Substitution
title_full Accelerator for Flexible QR Decomposition and Back Substitution
title_fullStr Accelerator for Flexible QR Decomposition and Back Substitution
title_full_unstemmed Accelerator for Flexible QR Decomposition and Back Substitution
title_sort accelerator for flexible qr decomposition and back substitution
publishDate 2020
url http://hdl.handle.net/2286/R.I.63084
_version_ 1719373039484796928