Accelerator for Flexible QR Decomposition and Back Substitution

abstract: QR decomposition (QRD) of a matrix is one of the most common linear algebra operationsused for the decomposition of a square/non-square matrix. It has a wide range of applications especially in Multiple Input-Multiple Output (MIMO) communication systems. Unfortunately it has high computati...

Full description

Bibliographic Details
Other Authors: Kanagala, Srimayee (Author)
Format: Dissertation
Language:English
Published: 2020
Subjects:
QRD
Online Access:http://hdl.handle.net/2286/R.I.63084
Description
Summary:abstract: QR decomposition (QRD) of a matrix is one of the most common linear algebra operationsused for the decomposition of a square/non-square matrix. It has a wide range of applications especially in Multiple Input-Multiple Output (MIMO) communication systems. Unfortunately it has high computation complexity { for matrix size of nxn, QRD has O(n3) complexity and back substitution, which is used to solve a system of linear equations, has O(n2) complexity. Thus, as the matrix size increases, the hardware resource requirement for QRD and back substitution increases signicantly. This thesis presents the design and implementation of a exible QRD and back substitution accelerator using a folded architecture. It can support matrix sizes of 4x4, 8x8, 12x12, 16x16, and 20x20 with low hardware resource requirement. The proposed architecture is based on the systolic array implementation of the Givens algorithm for QRD. It is built with three dierent types of computation blocks which are connected in a 2-D array structure. These blocks are controlled by a scheduler which facilitates reusability of the blocks to perform computation for any input matrix size which is a multiple of 4. These blocks are designed using two basic programming elements which support both the forward and backward paths to compute matrix R in QRD and column-matrix X in back substitution computation. The proposed architecture has been mapped to Xilinx Zynq Ultrascale+ FPGA (Field Programmable Gate Array), ZCU102. All inputs are complex with precision of 40 bits (38 fractional bits and 1 signed bit). The architecture can be clocked at 50 MHz. The synthesis results of the folded architecture for dierent matrix sizes are presented. The results show that the folded architecture can support QRD and back substitution for inputs of large sizes which otherwise cannot t on an FPGA when implemented using a at architecture. The memory sizes required for dierent matrix sizes are also presented. === Dissertation/Thesis === Masters Thesis Electrical Engineering 2020