Summary: | We present an integral image algorithm that can run in real-time on a Graphics Processing Unit (GPU). Our system exploits the parallelisms in computation via the NIVIDA CUDA programming model, which is a software platform for solving non-graphics problems in a massively parallel high-performance fashion. This implementation makes use of the work-efficient scan algorithm that is explicated in. Treating the rows and the columns of the target image as independent input arrays for the scan algorithm, our method manages to expose a second level of parallelism in the problem. We compare the performance of the parallel approach running on the GPU with the sequential CPU implementation across a range of image sizes and report a speed up by a factor of 8 for a 4 megapixel input. We further investigate the impact of using packed vector type data on the performance, as well as the effect of double precision arithmetic on the GPU.
|