Software-defined pulse-doppler radar signal processing on graphics processors
Modern pulse-Doppler radars use digital receivers with high speed ADCs and sophisticated radar signal processors that necessitate high data rates, computationally intensive processing, and strict latency requirements. Data-independent processing is performed as the first stage and requires the hig...
Main Author: | |
---|---|
Other Authors: | |
Language: | en |
Published: |
University of Pretoria
2015
|
Subjects: | |
Online Access: | http://hdl.handle.net/2263/43276 Venter, CJ 2014, Software-defined pulse-doppler radar signal processing on graphics processors, MEng Dissertation, University of Pretoria, Pretoria, viewed yymmdd <http://hdl.handle.net/2263/43276> |
id |
ndltd-netd.ac.za-oai-union.ndltd.org-up-oai-repository.up.ac.za-2263-43276 |
---|---|
record_format |
oai_dc |
collection |
NDLTD |
language |
en |
sources |
NDLTD |
topic |
UCTD |
spellingShingle |
UCTD Venter, Christian Jacobus Software-defined pulse-doppler radar signal processing on graphics processors |
description |
Modern pulse-Doppler radars use digital receivers with high speed ADCs and sophisticated radar signal
processors that necessitate high data rates, computationally intensive processing, and strict latency
requirements. Data-independent processing is performed as the first stage and requires the highest
data and computational rates of between 1 Gigaops to 1 Teraops, traditionally reserved for specialized
circuits that typically employ restrictive fixed-point arithmetic. The first stage generally requires FIR
filters, correlation, Fourier transforms, and matrix-vector algebra on multi-dimensional data, which
provides a range of demanding and interesting computational challenges, and that present ample opportunities
for parallel processing. Modern many-core GPUs provide general-purpose computation
on the GPU (GPGPU) for high-performance computing applications through fully programmable
pipelines, high memory bandwidths of up to hundreds of Gigabytes per second and high floatingpoint
computational performance of up to several Teraflops on a single chip. The massively-parallel
GPU architecture is well-suited for intrinsically parallel applications that require high dynamic range,
such as radar signal processing. However, numerous factors have to be considered in order to realize
the massive performance potential through a conventionally unfamiliar stream-programming
paradigm. Explicit control is also granted over a deep memory hierarchy and parallelism at various
granularities within an optimization space that is considered non-linear in many respects. The aim of this research is to address and characterize the challenges and intricacies of using modern
GPUs with GPGPU capabilities for the computationally demanding software-defined pulse-Doppler
radar signal processing application. A single receiver-element, coherent pulse-Doppler system with
a two-dimensional data storage model was assumed, due to widespread use and the interesting challenges
and opportunities that it provides for parallel implementation on the GPU architecture. The
NVIDIA Tesla C1060 GPU and CUDA were selected as a suitable GPGPU platform for the implementation
using single-precision floating-point arithmetic. A set of microbenchmarks was first
developed to isolate and highlight fundamental traits and relevant features of the GPU architecture, in
order to determine their impact in the radar application context. The common digital pulse compression
(DPC), corner turning (CT), Doppler filtering (DF), envelope (ENV) and constant false-alarm
rate (CFAR) processing functions were then implemented and optimized for the GPU architecture.
Multiple algorithmic variants were implemented, where appropriate, to evaluate the efficiency of different
algorithmic structures on the GPU architecture. These functions were then integrated to form
a radar signal processing chain, which allowed for further holistic optimization under realistic conditions.
An experimental framework and simple analytical framework was developed and utilized for
analyzing low-level kernel performance and high-level system performance for individual functions
and the processing chain.
The microbenchmark results highlighted the severity of uncoalesced device memory access as well as
the importance of high arithmetic intensity to achieve high computational throughput, and an asymmetry
in performance for primitive math operations. Further, the microbenchmark results showed
that memory transfer performance for small buffers or effectively small radar bursts is fundamentally
poor, but also that memory transfer can be efficiently overlapped with computation, reducing the impact
of slow transfers in general. For the DPC and DF functions, the FFT-based variants using the
CUFFT library proved optimal. For the CT function, the use of shared memory is vital to achieve fully
coalesced transfers, and the lesser-known, but potentially highly detrimental, partition camping effect
needs to be addressed. For the CFAR function, the segmentation into separate processing stages for
rows and columns proved the most vital overall optimization. The ENV function along with several
simple GPU helper-kernels with low arithmetic intensity such as padding, scaling, and the window
function were found to be bandwidth-limited, as expected, and hence performs comparably to a pure
copy kernel. Based on the findings, pulse-Doppler radar signal processing on GPUs is highly feasible
for medium to large burst sizes, provided that the main performance contributors and detractors for
the target GPU architecture is well understood and adhered to. === Dissertation (MEng)--University of Pretoria, 2014. === lk2014 === Electrical, Electronic and Computer Engineering === MEng === Unrestricted |
author2 |
Grobler, H. |
author_facet |
Grobler, H. Venter, Christian Jacobus |
author |
Venter, Christian Jacobus |
author_sort |
Venter, Christian Jacobus |
title |
Software-defined pulse-doppler radar signal processing on graphics processors |
title_short |
Software-defined pulse-doppler radar signal processing on graphics processors |
title_full |
Software-defined pulse-doppler radar signal processing on graphics processors |
title_fullStr |
Software-defined pulse-doppler radar signal processing on graphics processors |
title_full_unstemmed |
Software-defined pulse-doppler radar signal processing on graphics processors |
title_sort |
software-defined pulse-doppler radar signal processing on graphics processors |
publisher |
University of Pretoria |
publishDate |
2015 |
url |
http://hdl.handle.net/2263/43276 Venter, CJ 2014, Software-defined pulse-doppler radar signal processing on graphics processors, MEng Dissertation, University of Pretoria, Pretoria, viewed yymmdd <http://hdl.handle.net/2263/43276> |
work_keys_str_mv |
AT venterchristianjacobus softwaredefinedpulsedopplerradarsignalprocessingongraphicsprocessors |
_version_ |
1719316326276661248 |
spelling |
ndltd-netd.ac.za-oai-union.ndltd.org-up-oai-repository.up.ac.za-2263-432762020-06-02T03:18:20Z Software-defined pulse-doppler radar signal processing on graphics processors Venter, Christian Jacobus Grobler, H. cventer@csir.co.za UCTD Modern pulse-Doppler radars use digital receivers with high speed ADCs and sophisticated radar signal processors that necessitate high data rates, computationally intensive processing, and strict latency requirements. Data-independent processing is performed as the first stage and requires the highest data and computational rates of between 1 Gigaops to 1 Teraops, traditionally reserved for specialized circuits that typically employ restrictive fixed-point arithmetic. The first stage generally requires FIR filters, correlation, Fourier transforms, and matrix-vector algebra on multi-dimensional data, which provides a range of demanding and interesting computational challenges, and that present ample opportunities for parallel processing. Modern many-core GPUs provide general-purpose computation on the GPU (GPGPU) for high-performance computing applications through fully programmable pipelines, high memory bandwidths of up to hundreds of Gigabytes per second and high floatingpoint computational performance of up to several Teraflops on a single chip. The massively-parallel GPU architecture is well-suited for intrinsically parallel applications that require high dynamic range, such as radar signal processing. However, numerous factors have to be considered in order to realize the massive performance potential through a conventionally unfamiliar stream-programming paradigm. Explicit control is also granted over a deep memory hierarchy and parallelism at various granularities within an optimization space that is considered non-linear in many respects. The aim of this research is to address and characterize the challenges and intricacies of using modern GPUs with GPGPU capabilities for the computationally demanding software-defined pulse-Doppler radar signal processing application. A single receiver-element, coherent pulse-Doppler system with a two-dimensional data storage model was assumed, due to widespread use and the interesting challenges and opportunities that it provides for parallel implementation on the GPU architecture. The NVIDIA Tesla C1060 GPU and CUDA were selected as a suitable GPGPU platform for the implementation using single-precision floating-point arithmetic. A set of microbenchmarks was first developed to isolate and highlight fundamental traits and relevant features of the GPU architecture, in order to determine their impact in the radar application context. The common digital pulse compression (DPC), corner turning (CT), Doppler filtering (DF), envelope (ENV) and constant false-alarm rate (CFAR) processing functions were then implemented and optimized for the GPU architecture. Multiple algorithmic variants were implemented, where appropriate, to evaluate the efficiency of different algorithmic structures on the GPU architecture. These functions were then integrated to form a radar signal processing chain, which allowed for further holistic optimization under realistic conditions. An experimental framework and simple analytical framework was developed and utilized for analyzing low-level kernel performance and high-level system performance for individual functions and the processing chain. The microbenchmark results highlighted the severity of uncoalesced device memory access as well as the importance of high arithmetic intensity to achieve high computational throughput, and an asymmetry in performance for primitive math operations. Further, the microbenchmark results showed that memory transfer performance for small buffers or effectively small radar bursts is fundamentally poor, but also that memory transfer can be efficiently overlapped with computation, reducing the impact of slow transfers in general. For the DPC and DF functions, the FFT-based variants using the CUFFT library proved optimal. For the CT function, the use of shared memory is vital to achieve fully coalesced transfers, and the lesser-known, but potentially highly detrimental, partition camping effect needs to be addressed. For the CFAR function, the segmentation into separate processing stages for rows and columns proved the most vital overall optimization. The ENV function along with several simple GPU helper-kernels with low arithmetic intensity such as padding, scaling, and the window function were found to be bandwidth-limited, as expected, and hence performs comparably to a pure copy kernel. Based on the findings, pulse-Doppler radar signal processing on GPUs is highly feasible for medium to large burst sizes, provided that the main performance contributors and detractors for the target GPU architecture is well understood and adhered to. Dissertation (MEng)--University of Pretoria, 2014. lk2014 Electrical, Electronic and Computer Engineering MEng Unrestricted 2015-01-19T12:13:23Z 2015-01-19T12:13:23Z 2014/12/12 2014 Dissertation http://hdl.handle.net/2263/43276 Venter, CJ 2014, Software-defined pulse-doppler radar signal processing on graphics processors, MEng Dissertation, University of Pretoria, Pretoria, viewed yymmdd <http://hdl.handle.net/2263/43276> M14/9/473 21005550 en © 2014 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. University of Pretoria |