Performance Analysis of Non Local Means Algorithm using Hardware Accelerators

Image De-noising forms an integral part of image processing. It is used as a standalone algorithm for improving the quality of the image obtained through camera as well as a starting stage for image processing applications like face recognition, super resolution etc. Non Local Means (NL-Means) and B...

Full description

Bibliographic Details
Main Author: Antony, Daniel Sanju
Other Authors: Rathna, G N
Language:en_US
Published: 2017
Subjects:
GPU
Online Access:http://etd.iisc.ernet.in/handle/2005/2932
http://etd.ncsi.iisc.ernet.in/abstracts/3794/G27791-Abs.pdf
id ndltd-IISc-oai-etd.ncsi.iisc.ernet.in-2005-2932
record_format oai_dc
spelling ndltd-IISc-oai-etd.ncsi.iisc.ernet.in-2005-29322018-01-10T03:37:01ZPerformance Analysis of Non Local Means Algorithm using Hardware AcceleratorsAntony, Daniel SanjuAlgorithm using Open Computing LanguageImage DenoisingNon Local Means AlgorithmOpenCLField-Programmable Gate Array (FPGA)Open Computing LanguageGPUGraphics Processing UnitAdditive White Gaussian Noise (AWGN)NL-Means AlgorithmElectrcal EngineeringImage De-noising forms an integral part of image processing. It is used as a standalone algorithm for improving the quality of the image obtained through camera as well as a starting stage for image processing applications like face recognition, super resolution etc. Non Local Means (NL-Means) and Bilateral Filter are two computationally complex de-noising algorithms which could provide good de-noising results. Due to its computational complexity, the real time applications associated with these letters are limited. In this thesis, we propose the use of hardware accelerators such as GPU (Graphics Processing Units) and FPGA (Field Programmable Gate Arrays) to speed up the filter execution and efficiently implement using them. GPU based implementation of these letters is carried out using Open Computing Language (Open CL). The basic objective of this research is to perform high speed de-noising without compromising on the quality. Here we implement a basic NL-Means filter, a Fast NL-Means filter, and Bilateral filter using Gauss Polynomial decomposition on GPU. We also propose a modification to the existing NL-Means algorithm and Gauss Polynomial Bilateral filter. Instead of Gaussian Spatial Kernel used in standard algorithm, Box Spatial kernel is introduced to improve the speed of execution of the algorithm. This research work is a step forward towards making the real time implementation of these algorithms possible. It has been found from results that the NL-Means implementation on GPU using Open CL is about 25x faster than regular CPU based implementation for larger images (1024x1024). For Fast NL-Means, GPU based implementation is about 90x faster than CPU implementation. Even with the improved execution time, the embedded system application of the NL-Means is limited due to the power and thermal restrictions of the GPU device. In order to create a low power and faster implementation, we have implemented the algorithm on FPGA. FPGAs are reconfigurable devices and enable us to create a custom architecture for the parallel execution of the algorithm. It was found that the execution time for smaller images (256x256) is about 200x faster than CPU implementation and about 25x faster than GPU execution. Moreover the power requirements of the FPGA design of the algorithm (0.53W) is much less compared to CPU(30W) and GPU(200W).Rathna, G N2017-12-16T08:38:53Z2017-12-16T08:38:53Z2017-12-162016Thesishttp://etd.iisc.ernet.in/handle/2005/2932http://etd.ncsi.iisc.ernet.in/abstracts/3794/G27791-Abs.pdfen_USG27791
collection NDLTD
language en_US
sources NDLTD
topic Algorithm using Open Computing Language
Image Denoising
Non Local Means Algorithm
OpenCL
Field-Programmable Gate Array (FPGA)
Open Computing Language
GPU
Graphics Processing Unit
Additive White Gaussian Noise (AWGN)
NL-Means Algorithm
Electrcal Engineering
spellingShingle Algorithm using Open Computing Language
Image Denoising
Non Local Means Algorithm
OpenCL
Field-Programmable Gate Array (FPGA)
Open Computing Language
GPU
Graphics Processing Unit
Additive White Gaussian Noise (AWGN)
NL-Means Algorithm
Electrcal Engineering
Antony, Daniel Sanju
Performance Analysis of Non Local Means Algorithm using Hardware Accelerators
description Image De-noising forms an integral part of image processing. It is used as a standalone algorithm for improving the quality of the image obtained through camera as well as a starting stage for image processing applications like face recognition, super resolution etc. Non Local Means (NL-Means) and Bilateral Filter are two computationally complex de-noising algorithms which could provide good de-noising results. Due to its computational complexity, the real time applications associated with these letters are limited. In this thesis, we propose the use of hardware accelerators such as GPU (Graphics Processing Units) and FPGA (Field Programmable Gate Arrays) to speed up the filter execution and efficiently implement using them. GPU based implementation of these letters is carried out using Open Computing Language (Open CL). The basic objective of this research is to perform high speed de-noising without compromising on the quality. Here we implement a basic NL-Means filter, a Fast NL-Means filter, and Bilateral filter using Gauss Polynomial decomposition on GPU. We also propose a modification to the existing NL-Means algorithm and Gauss Polynomial Bilateral filter. Instead of Gaussian Spatial Kernel used in standard algorithm, Box Spatial kernel is introduced to improve the speed of execution of the algorithm. This research work is a step forward towards making the real time implementation of these algorithms possible. It has been found from results that the NL-Means implementation on GPU using Open CL is about 25x faster than regular CPU based implementation for larger images (1024x1024). For Fast NL-Means, GPU based implementation is about 90x faster than CPU implementation. Even with the improved execution time, the embedded system application of the NL-Means is limited due to the power and thermal restrictions of the GPU device. In order to create a low power and faster implementation, we have implemented the algorithm on FPGA. FPGAs are reconfigurable devices and enable us to create a custom architecture for the parallel execution of the algorithm. It was found that the execution time for smaller images (256x256) is about 200x faster than CPU implementation and about 25x faster than GPU execution. Moreover the power requirements of the FPGA design of the algorithm (0.53W) is much less compared to CPU(30W) and GPU(200W).
author2 Rathna, G N
author_facet Rathna, G N
Antony, Daniel Sanju
author Antony, Daniel Sanju
author_sort Antony, Daniel Sanju
title Performance Analysis of Non Local Means Algorithm using Hardware Accelerators
title_short Performance Analysis of Non Local Means Algorithm using Hardware Accelerators
title_full Performance Analysis of Non Local Means Algorithm using Hardware Accelerators
title_fullStr Performance Analysis of Non Local Means Algorithm using Hardware Accelerators
title_full_unstemmed Performance Analysis of Non Local Means Algorithm using Hardware Accelerators
title_sort performance analysis of non local means algorithm using hardware accelerators
publishDate 2017
url http://etd.iisc.ernet.in/handle/2005/2932
http://etd.ncsi.iisc.ernet.in/abstracts/3794/G27791-Abs.pdf
work_keys_str_mv AT antonydanielsanju performanceanalysisofnonlocalmeansalgorithmusinghardwareaccelerators
_version_ 1718603924715339776