Cache Design and Techniques for Improving Packet Classification Performance on Network Processor

博士 === 國立成功大學 === 資訊工程學系碩博士班 === 101 === Packet classification is the major operation for routers to classify incoming packets to different flows. In this thesis, we implement some notable hierarchical or decision-tree-based packet classification algorithms such as extended grid of tries (EGT), hier...

Full description

Bibliographic Details
Main Authors: Fang-ChenKuo, 郭芳辰
Other Authors: Yeim-Kuan Chang
Format: Others
Language:en_US
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/67621888954483984921
Description
Summary:博士 === 國立成功大學 === 資訊工程學系碩博士班 === 101 === Packet classification is the major operation for routers to classify incoming packets to different flows. In this thesis, we implement some notable hierarchical or decision-tree-based packet classification algorithms such as extended grid of tries (EGT), hierarchical intelligent cuttings (HiCuts), HyperCuts, and hierarchical binary search (HBS) on an IXP2400 and IXP2800 network processor. By using all of the available processing microengines (MEs), we find that none of these existing packet classification algorithms achieve the line speed of OC-48 (OC-192) provided by IXP2400 (IXP2800). To improve the search speed of these packet classification algorithms, we propose the use of software cache designs to take advantage of the temporal locality of the packets because IXP network processors have no built-in caches for fast path processing in MEs. Two different approaches are proposed. Firstly, we propose hint-based cache designs to reduce the search duration of the packet classification data structure when cache misses occur. Both the header and prefix caches are studied. Although the proposed cache schemes are designed for all the dimension-by-dimension packet classification schemes, they are, nonetheless, the most suitable for HBS. Our performance simulations show that the HBS enhanced with the proposed cache schemes performs the best in terms of classification speed and number of memory accesses when the memory requirement is in the same range as those of HiCuts and HyperCuts. Based on the experiments with all the high and low locality packet traces, five MEs are sufficient for the proposed rule cache with hints to achieve the line speed of OC-48 provided by IXP2400. The second approach Delay Processing mechanism is proposed to solve the performance degradation during bursty traffic. The approach DP mechanism utilizes the property of multi-thread network processors. Pending table delays the subsequent threads from doing the same computations which are being processed by the former thread. By applying DP mechanism, any packet processing tasks with high locality characteristic, can avoid the duplicate computations and hence achieve a higher packet processing rate. The experimental results show that DP mechanism can further improve the achievable throughput for the network processor we used.