Software-defined pulse-doppler radar signal processing on graphics processors

Modern pulse-Doppler radars use digital receivers with high speed ADCs and sophisticated radar signal processors that necessitate high data rates, computationally intensive processing, and strict latency requirements. Data-independent processing is performed as the first stage and requires the hig...

Full description

Bibliographic Details
Main Author: Venter, Christian Jacobus
Other Authors: Grobler, H.
Language:en
Published: University of Pretoria 2015
Subjects:
Online Access:http://hdl.handle.net/2263/43276
Venter, CJ 2014, Software-defined pulse-doppler radar signal processing on graphics processors, MEng Dissertation, University of Pretoria, Pretoria, viewed yymmdd <http://hdl.handle.net/2263/43276>
id ndltd-netd.ac.za-oai-union.ndltd.org-up-oai-repository.up.ac.za-2263-43276
record_format oai_dc
collection NDLTD
language en
sources NDLTD
topic UCTD
spellingShingle UCTD
Venter, Christian Jacobus
Software-defined pulse-doppler radar signal processing on graphics processors
description Modern pulse-Doppler radars use digital receivers with high speed ADCs and sophisticated radar signal processors that necessitate high data rates, computationally intensive processing, and strict latency requirements. Data-independent processing is performed as the first stage and requires the highest data and computational rates of between 1 Gigaops to 1 Teraops, traditionally reserved for specialized circuits that typically employ restrictive fixed-point arithmetic. The first stage generally requires FIR filters, correlation, Fourier transforms, and matrix-vector algebra on multi-dimensional data, which provides a range of demanding and interesting computational challenges, and that present ample opportunities for parallel processing. Modern many-core GPUs provide general-purpose computation on the GPU (GPGPU) for high-performance computing applications through fully programmable pipelines, high memory bandwidths of up to hundreds of Gigabytes per second and high floatingpoint computational performance of up to several Teraflops on a single chip. The massively-parallel GPU architecture is well-suited for intrinsically parallel applications that require high dynamic range, such as radar signal processing. However, numerous factors have to be considered in order to realize the massive performance potential through a conventionally unfamiliar stream-programming paradigm. Explicit control is also granted over a deep memory hierarchy and parallelism at various granularities within an optimization space that is considered non-linear in many respects. The aim of this research is to address and characterize the challenges and intricacies of using modern GPUs with GPGPU capabilities for the computationally demanding software-defined pulse-Doppler radar signal processing application. A single receiver-element, coherent pulse-Doppler system with a two-dimensional data storage model was assumed, due to widespread use and the interesting challenges and opportunities that it provides for parallel implementation on the GPU architecture. The NVIDIA Tesla C1060 GPU and CUDA were selected as a suitable GPGPU platform for the implementation using single-precision floating-point arithmetic. A set of microbenchmarks was first developed to isolate and highlight fundamental traits and relevant features of the GPU architecture, in order to determine their impact in the radar application context. The common digital pulse compression (DPC), corner turning (CT), Doppler filtering (DF), envelope (ENV) and constant false-alarm rate (CFAR) processing functions were then implemented and optimized for the GPU architecture. Multiple algorithmic variants were implemented, where appropriate, to evaluate the efficiency of different algorithmic structures on the GPU architecture. These functions were then integrated to form a radar signal processing chain, which allowed for further holistic optimization under realistic conditions. An experimental framework and simple analytical framework was developed and utilized for analyzing low-level kernel performance and high-level system performance for individual functions and the processing chain. The microbenchmark results highlighted the severity of uncoalesced device memory access as well as the importance of high arithmetic intensity to achieve high computational throughput, and an asymmetry in performance for primitive math operations. Further, the microbenchmark results showed that memory transfer performance for small buffers or effectively small radar bursts is fundamentally poor, but also that memory transfer can be efficiently overlapped with computation, reducing the impact of slow transfers in general. For the DPC and DF functions, the FFT-based variants using the CUFFT library proved optimal. For the CT function, the use of shared memory is vital to achieve fully coalesced transfers, and the lesser-known, but potentially highly detrimental, partition camping effect needs to be addressed. For the CFAR function, the segmentation into separate processing stages for rows and columns proved the most vital overall optimization. The ENV function along with several simple GPU helper-kernels with low arithmetic intensity such as padding, scaling, and the window function were found to be bandwidth-limited, as expected, and hence performs comparably to a pure copy kernel. Based on the findings, pulse-Doppler radar signal processing on GPUs is highly feasible for medium to large burst sizes, provided that the main performance contributors and detractors for the target GPU architecture is well understood and adhered to. === Dissertation (MEng)--University of Pretoria, 2014. === lk2014 === Electrical, Electronic and Computer Engineering === MEng === Unrestricted
author2 Grobler, H.
author_facet Grobler, H.
Venter, Christian Jacobus
author Venter, Christian Jacobus
author_sort Venter, Christian Jacobus
title Software-defined pulse-doppler radar signal processing on graphics processors
title_short Software-defined pulse-doppler radar signal processing on graphics processors
title_full Software-defined pulse-doppler radar signal processing on graphics processors
title_fullStr Software-defined pulse-doppler radar signal processing on graphics processors
title_full_unstemmed Software-defined pulse-doppler radar signal processing on graphics processors
title_sort software-defined pulse-doppler radar signal processing on graphics processors
publisher University of Pretoria
publishDate 2015
url http://hdl.handle.net/2263/43276
Venter, CJ 2014, Software-defined pulse-doppler radar signal processing on graphics processors, MEng Dissertation, University of Pretoria, Pretoria, viewed yymmdd <http://hdl.handle.net/2263/43276>
work_keys_str_mv AT venterchristianjacobus softwaredefinedpulsedopplerradarsignalprocessingongraphicsprocessors
_version_ 1719316326276661248
spelling ndltd-netd.ac.za-oai-union.ndltd.org-up-oai-repository.up.ac.za-2263-432762020-06-02T03:18:20Z Software-defined pulse-doppler radar signal processing on graphics processors Venter, Christian Jacobus Grobler, H. cventer@csir.co.za UCTD Modern pulse-Doppler radars use digital receivers with high speed ADCs and sophisticated radar signal processors that necessitate high data rates, computationally intensive processing, and strict latency requirements. Data-independent processing is performed as the first stage and requires the highest data and computational rates of between 1 Gigaops to 1 Teraops, traditionally reserved for specialized circuits that typically employ restrictive fixed-point arithmetic. The first stage generally requires FIR filters, correlation, Fourier transforms, and matrix-vector algebra on multi-dimensional data, which provides a range of demanding and interesting computational challenges, and that present ample opportunities for parallel processing. Modern many-core GPUs provide general-purpose computation on the GPU (GPGPU) for high-performance computing applications through fully programmable pipelines, high memory bandwidths of up to hundreds of Gigabytes per second and high floatingpoint computational performance of up to several Teraflops on a single chip. The massively-parallel GPU architecture is well-suited for intrinsically parallel applications that require high dynamic range, such as radar signal processing. However, numerous factors have to be considered in order to realize the massive performance potential through a conventionally unfamiliar stream-programming paradigm. Explicit control is also granted over a deep memory hierarchy and parallelism at various granularities within an optimization space that is considered non-linear in many respects. The aim of this research is to address and characterize the challenges and intricacies of using modern GPUs with GPGPU capabilities for the computationally demanding software-defined pulse-Doppler radar signal processing application. A single receiver-element, coherent pulse-Doppler system with a two-dimensional data storage model was assumed, due to widespread use and the interesting challenges and opportunities that it provides for parallel implementation on the GPU architecture. The NVIDIA Tesla C1060 GPU and CUDA were selected as a suitable GPGPU platform for the implementation using single-precision floating-point arithmetic. A set of microbenchmarks was first developed to isolate and highlight fundamental traits and relevant features of the GPU architecture, in order to determine their impact in the radar application context. The common digital pulse compression (DPC), corner turning (CT), Doppler filtering (DF), envelope (ENV) and constant false-alarm rate (CFAR) processing functions were then implemented and optimized for the GPU architecture. Multiple algorithmic variants were implemented, where appropriate, to evaluate the efficiency of different algorithmic structures on the GPU architecture. These functions were then integrated to form a radar signal processing chain, which allowed for further holistic optimization under realistic conditions. An experimental framework and simple analytical framework was developed and utilized for analyzing low-level kernel performance and high-level system performance for individual functions and the processing chain. The microbenchmark results highlighted the severity of uncoalesced device memory access as well as the importance of high arithmetic intensity to achieve high computational throughput, and an asymmetry in performance for primitive math operations. Further, the microbenchmark results showed that memory transfer performance for small buffers or effectively small radar bursts is fundamentally poor, but also that memory transfer can be efficiently overlapped with computation, reducing the impact of slow transfers in general. For the DPC and DF functions, the FFT-based variants using the CUFFT library proved optimal. For the CT function, the use of shared memory is vital to achieve fully coalesced transfers, and the lesser-known, but potentially highly detrimental, partition camping effect needs to be addressed. For the CFAR function, the segmentation into separate processing stages for rows and columns proved the most vital overall optimization. The ENV function along with several simple GPU helper-kernels with low arithmetic intensity such as padding, scaling, and the window function were found to be bandwidth-limited, as expected, and hence performs comparably to a pure copy kernel. Based on the findings, pulse-Doppler radar signal processing on GPUs is highly feasible for medium to large burst sizes, provided that the main performance contributors and detractors for the target GPU architecture is well understood and adhered to. Dissertation (MEng)--University of Pretoria, 2014. lk2014 Electrical, Electronic and Computer Engineering MEng Unrestricted 2015-01-19T12:13:23Z 2015-01-19T12:13:23Z 2014/12/12 2014 Dissertation http://hdl.handle.net/2263/43276 Venter, CJ 2014, Software-defined pulse-doppler radar signal processing on graphics processors, MEng Dissertation, University of Pretoria, Pretoria, viewed yymmdd <http://hdl.handle.net/2263/43276> M14/9/473 21005550 en © 2014 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. University of Pretoria