Parallelization of the Estimation Algorithm of the 3D Structure Tensor

This thesis work provides the implementation of 3D structure tensor on a Massively Parallel Processor Array (MPPA), Ambric 2045.   The 3D structure tensor algorithm is often used in image processing applications to compute the optical flow or to detect local 3D structures and their directions. The 3...

Full description

Bibliographic Details
Main Author: Alam, Ashraful
Format: Others
Language:English
Published: Högskolan i Halmstad, Sektionen för Informationsvetenskap, Data– och Elektroteknik (IDE) 2012
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-17494
Description
Summary:This thesis work provides the implementation of 3D structure tensor on a Massively Parallel Processor Array (MPPA), Ambric 2045.   The 3D structure tensor algorithm is often used in image processing applications to compute the optical flow or to detect local 3D structures and their directions. The 3D structure tensor algorithm (3D-STA) consists of three main parts: gradient, tensor and smoothing. This algorithm is computationally expensive due to many multiplications and additions which are required to calculate the gradient (edge), the tensor and to smooth every pixel of the image. This is why this algorithm is very slow to run on a single processor. Therefore, it is important to make it parallel for high performance computation.   This thesis provides two parallel implementations of 3D-STA; namely coarse-grained parallelism and fine-grained parallelism. Ambric has 336 processors. Only 49 processors are used in coarse-grained implementation and 165 processors are used in fine-grained implementation. The performance of the two implementations is measured using a video stream input, consisting of a sequence of images of size 20x256x256. The performance of the coarse-grained parallelism implementation is 25 frames per second (fps) and the one of the fine-grained parallelism implementation is 100 fps. Thus the fine-grained version is four time faster than the coarse-grained one.   Additionally, the results are compared with the result of the Matlab implementation, running on Intel(R) Core 2 duo @2.10 GHz processor and also compared with another parallel optical flow implementation, in terms of speed and efficiency. The coarse-grained implementation is 58 times faster than the Matlab implementation and it achieves approximately half of the performance of the other parallel optical flow implementation. On the other hand, the fine-grained implementation is 230 times faster than the Matlab implementation and more than twice as (100/43) fast as the other parallel optical flow implementation.   These performance results are satisfactory and the results that our parallel implementations can be considered for real-time applications.