Stream-based Packet Processing on General Purpose GPU Architecture

碩士 === 中原大學 === 電子工程研究所 === 99 === GPU and other SIMD stream architecture have been used for accelerating packet processing applications. This thesis explores the parallel implementation of sketch-based network traffic change detection application on GPU, multi-core CPU, and Cell processor using Ope...

Full description

Bibliographic Details
Main Authors: Theophilus-Yohanis Hermanus, 魏特佑
Other Authors: Yu-Kuen Lai
Format: Others
Language:en_US
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/18902792511191815664
Description
Summary:碩士 === 中原大學 === 電子工程研究所 === 99 === GPU and other SIMD stream architecture have been used for accelerating packet processing applications. This thesis explores the parallel implementation of sketch-based network traffic change detection application on GPU, multi-core CPU, and Cell processor using OpenCL parallel programming framework. Due to parallel nature of sketch data structure, the sketch computations can be mapped to the OpenCL execution model on GPU, multi-core CPU, and Cell processor. The sketch data structure is mapped to buffer object in device's global memory and work-items are executed on these sketches in parallel. The experiment results on Radeon HD 5870 GPU show that the parallel implementation of these sketch operations can speedup the computation time compared to sequential CPU implementation. The hash computation and ESTIMATE operation achieved 15.3X and 9.1X speedup, respectively. Our kernel implementation can reached more than 50% (78.64 GB/s) peak memory bandwidth of the 5870 GPU. The results also show that GPU is suitable for the sketch computations from multi-monitor and the data transfer rate from CPU to GPU is more effective if more than one monitor is used. For 16 monitors, the transfer rate for transferring keys from CPU memory to buffer in GPU memory can reached 2.28 GB/s. On multi-core CPU and Cell processor, using the same kernels with GPU without any optimizations, compared to sequential CPU implementation, the ESTIMATE operation can achieved 5.7X and 5.83X speedup, respectively.