Design and Implementation of an SVD processor for MIMO precoding systems

碩士 === 國立中央大學 === 電機工程學系 === 105 === Large-scale MIMO (multiple-input multiple-output) technique is considered to be one of the promising solution in the 5th generation wireless communication system. Due to the increasing antenna number at both the transmitter and receiver sides, higher computationa...

Full description

Bibliographic Details
Main Authors: Chun-Hung Wu, 吳俊弘
Other Authors: Pei-Yun Tsai
Format: Others
Language:zh-TW
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/77dvfj
Description
Summary:碩士 === 國立中央大學 === 電機工程學系 === 105 === Large-scale MIMO (multiple-input multiple-output) technique is considered to be one of the promising solution in the 5th generation wireless communication system. Due to the increasing antenna number at both the transmitter and receiver sides, higher computational complexity is induced. Singular value decomposition (SVD) is a kind of decom-position scheme that is widely used to decompose the channel matrix into several spatial sub-channels. Usually the sub-channels with large channel gains (singular values) are chosen for transmission. A three-stage algorithm is used, and all can be accomplished by Givens rotation. Bi-diagonalization is employed at the first stage while Golub-Reinsch SVD with split and deflation is adopted at the second stage. We use shifted QR with early termination at the third stage. The SVD procedure can adjust the convergence speed and the accuracy according to the system requirements. Considering SVD in 8×8 MIMO systems, the hardware implementation is mainly based on the first-stage and the second-stage operations. Our hardware can support channel matrix dimension from 2×2 to8×8. External floating-point and internal fixed-point representations are used for the datapath. The Givens rotation is realized by Coordinate Rotation Digital Computer (CORDIC). Two CORDIC modules constitute one processing element (PE). It takes 171 clock cycles for the first-stage operation and 313 clock cycles for the entire decomposition if the threshold for split and deflation at the second stage is set to 2^(-3). The hardware is implemented by TSMC 40nm CMOS technology. The maximum operating frequency is 185MHz and the throughput is 591K matrixes per second.