High Performance Packet Processing on Multi-queue and Multi-core Platforms

博士 === 國立清華大學 === 通訊工程研究所 === 103 === Advances in semiconductor technology are making way for multi-core and many-core processors that incorporate tens to hundreds cores in a single package. Meanwhile, network interface cards (NICs) featuring multiple hardware reception (Rx) and transmission (Tx) qu...

Full description

Bibliographic Details
Main Authors:	Tsai, Wenyen, 蔡文嚴
Other Authors:	Huang, Nen Fu
Format:	Others
Language:	en_US
Published:	2015
Online Access:	http://ndltd.ncl.edu.tw/handle/18774014002518328624

id	ndltd-TW-103NTHU5650125
record_format	oai_dc
spelling	ndltd-TW-103NTHU56501252016-11-20T04:18:16Z http://ndltd.ncl.edu.tw/handle/18774014002518328624 High Performance Packet Processing on Multi-queue and Multi-core Platforms 於具備多佇列網路卡的多核心平台上對高效能封包處理之研究 Tsai, Wenyen 蔡文嚴博士國立清華大學通訊工程研究所 103 Advances in semiconductor technology are making way for multi-core and many-core processors that incorporate tens to hundreds cores in a single package. Meanwhile, network interface cards (NICs) featuring multiple hardware reception (Rx) and transmission (Tx) queues are the responses from the networking community to the prevalence of multi-core computing. Multi-queue networking circumvents the performance degradation due to contention of multi-core on a single Rx/Tx queue by distributing the packets across multiple queues. In the meantime, benefiting from evolving interrupt handling techniques, a NIC can now be allocated enough interrupt resource for each of its queues to associate to a dedicated core. Although the number of cores in CPUs continues to climb, many difficulties remain in building systems that are capable of keeping up with the packet volume in a modern middle to large scale network deployment. This is due to several factors, including the ever-increasing rate of network traffic, e.g., the now prevalent 10Gbps, the cutting-edge 40Gbps, and the upcoming 100Gbps NICs, and some fundamental limitations in both software and hardware architectures. Software imposed synchronization overheads for multi-core programming such as atomic operations and locking play a critical role affecting the packet processing performance. On the other spectrum, hardware architectural complication like cache coherency and NUMA effects brings new challenges that demand developers to equip with new skill set to unleash the real computing power. Correspondingly, researches attack these challenges by a hardware and software co-design approach that starts from investigating the underlying hardware, which collects necessary knowledge to facilitate software development and allow optimization. In this dissertation, we focus on two problems: 1) reducing the lock contentions when performing session tracking and 2) affinitizing interrupts from multi-queue NICs to CPU cores with the objective of maximizing packet processing performance. For the first problem, we propose a simple partitioning scheme aiming at striking a balance between excessive locking and lockless manipulations. Meanwhile, a resource balancing mechanism is also given to prevent the problem of underutilization of session tracking resources under circumstances of unbalanced traffic loads. The effectiveness is justified by improved performance as the number of cores that contend for a single lock decreases. On the other end of the spectrum, to address the problem of interrupt affinitization, an algorithmic approach based on numerical cost model is proposed to find the best affinitization. Comprehensive experiences covering 1G and 10G NICs with four networking applications ranging from L2 to L7 are conducted to justify the effectiveness. Huang, Nen Fu 黃能富 2015 學位論文 ; thesis 104 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	博士 === 國立清華大學 === 通訊工程研究所 === 103 === Advances in semiconductor technology are making way for multi-core and many-core processors that incorporate tens to hundreds cores in a single package. Meanwhile, network interface cards (NICs) featuring multiple hardware reception (Rx) and transmission (Tx) queues are the responses from the networking community to the prevalence of multi-core computing. Multi-queue networking circumvents the performance degradation due to contention of multi-core on a single Rx/Tx queue by distributing the packets across multiple queues. In the meantime, benefiting from evolving interrupt handling techniques, a NIC can now be allocated enough interrupt resource for each of its queues to associate to a dedicated core. Although the number of cores in CPUs continues to climb, many difficulties remain in building systems that are capable of keeping up with the packet volume in a modern middle to large scale network deployment. This is due to several factors, including the ever-increasing rate of network traffic, e.g., the now prevalent 10Gbps, the cutting-edge 40Gbps, and the upcoming 100Gbps NICs, and some fundamental limitations in both software and hardware architectures. Software imposed synchronization overheads for multi-core programming such as atomic operations and locking play a critical role affecting the packet processing performance. On the other spectrum, hardware architectural complication like cache coherency and NUMA effects brings new challenges that demand developers to equip with new skill set to unleash the real computing power. Correspondingly, researches attack these challenges by a hardware and software co-design approach that starts from investigating the underlying hardware, which collects necessary knowledge to facilitate software development and allow optimization. In this dissertation, we focus on two problems: 1) reducing the lock contentions when performing session tracking and 2) affinitizing interrupts from multi-queue NICs to CPU cores with the objective of maximizing packet processing performance. For the first problem, we propose a simple partitioning scheme aiming at striking a balance between excessive locking and lockless manipulations. Meanwhile, a resource balancing mechanism is also given to prevent the problem of underutilization of session tracking resources under circumstances of unbalanced traffic loads. The effectiveness is justified by improved performance as the number of cores that contend for a single lock decreases. On the other end of the spectrum, to address the problem of interrupt affinitization, an algorithmic approach based on numerical cost model is proposed to find the best affinitization. Comprehensive experiences covering 1G and 10G NICs with four networking applications ranging from L2 to L7 are conducted to justify the effectiveness.
author2	Huang, Nen Fu
author_facet	Huang, Nen Fu Tsai, Wenyen 蔡文嚴
author	Tsai, Wenyen 蔡文嚴
spellingShingle	Tsai, Wenyen 蔡文嚴 High Performance Packet Processing on Multi-queue and Multi-core Platforms
author_sort	Tsai, Wenyen
title	High Performance Packet Processing on Multi-queue and Multi-core Platforms
title_short	High Performance Packet Processing on Multi-queue and Multi-core Platforms
title_full	High Performance Packet Processing on Multi-queue and Multi-core Platforms
title_fullStr	High Performance Packet Processing on Multi-queue and Multi-core Platforms
title_full_unstemmed	High Performance Packet Processing on Multi-queue and Multi-core Platforms
title_sort	high performance packet processing on multi-queue and multi-core platforms
publishDate	2015
url	http://ndltd.ncl.edu.tw/handle/18774014002518328624
work_keys_str_mv	AT tsaiwenyen highperformancepacketprocessingonmultiqueueandmulticoreplatforms AT càiwényán highperformancepacketprocessingonmultiqueueandmulticoreplatforms AT tsaiwenyen yújùbèiduōzhùlièwǎnglùkǎdeduōhéxīnpíngtáishàngduìgāoxiàonéngfēngbāochùlǐzhīyánjiū AT càiwényán yújùbèiduōzhùlièwǎnglùkǎdeduōhéxīnpíngtáishàngduìgāoxiàonéngfēngbāochùlǐzhīyánjiū
_version_	1718395899206434816

High Performance Packet Processing on Multi-queue and Multi-core Platforms

Similar Items