Summary: | 碩士 === 國立成功大學 === 資訊工程學系 === 104 === Packet classification is a very important component for today’s network architecture, it can help or provide packet forwarding, Quality of Service (QoS), firewall, traffic control, or virtual private network (VPN). With the development of Internet and the emergence of software-defined networking (SDN), the methods designed for the traditional 5-dimensional rule set is not efficiently to process the current rule set that contains 12-dimensional or more dimensions rules. The main problem is how to process the rule sets those are 12-dimensional or more dimension and must can achieve high throughput. To achieve high throughput, there are some methods are implemented on GPU, some of them use a single hash table to process the searching ( [11], [21] ), some of anothers use Binary Range Tree to process ( [6], [10], [20] ), and [22] uses hash-based method and tuple space method. Because of the properties of 12-dimensional rule sets, the new 7 fields are all exact value or wildcard, use the single hash table or binary range tree is not efficiently. There is another problem, when use the normal GPU the process computing, we must transfer the input data and results with the PCI-E bus, the bus latency is a big bottleneck.
In this thesis, we propose a modified hash table to process the exact value or wildcard fields, and use the compressing method to reduce memory consumption. In the other hand, we implement this method on APU that uses Heterogeneous System Architecture to skip the bus delay between host and device. According to the experimental result on AMD A10-7850 APU, the throughput of our method can achieve 1586 to 1983 MPPS (Million Packet Per Second) throughput when the rule sets contain 12K 12-dimensional rules. Also the memory consumption of our proposed scheme is 38 MB. The throughput our proposed scheme is 10 times of the throughput of implementing the same method on legacy GPU. The method implemented on FPGA ([3]) can achieve 1250 MPPS, our scheme can achieve higher throughput. The throughput of our scheme is 10 to 40 times of other GPU-based method ([6], [11]).
|