High Performance Packet Processing on Multi-queue and Multi-core Platforms
博士 === 國立清華大學 === 通訊工程研究所 === 103 === Advances in semiconductor technology are making way for multi-core and many-core processors that incorporate tens to hundreds cores in a single package. Meanwhile, network interface cards (NICs) featuring multiple hardware reception (Rx) and transmission (Tx) qu...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2015
|
Online Access: | http://ndltd.ncl.edu.tw/handle/18774014002518328624 |
id |
ndltd-TW-103NTHU5650125 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-103NTHU56501252016-11-20T04:18:16Z http://ndltd.ncl.edu.tw/handle/18774014002518328624 High Performance Packet Processing on Multi-queue and Multi-core Platforms 於具備多佇列網路卡的多核心平台上對高效能封包處理之研究 Tsai, Wenyen 蔡文嚴 博士 國立清華大學 通訊工程研究所 103 Advances in semiconductor technology are making way for multi-core and many-core processors that incorporate tens to hundreds cores in a single package. Meanwhile, network interface cards (NICs) featuring multiple hardware reception (Rx) and transmission (Tx) queues are the responses from the networking community to the prevalence of multi-core computing. Multi-queue networking circumvents the performance degradation due to contention of multi-core on a single Rx/Tx queue by distributing the packets across multiple queues. In the meantime, benefiting from evolving interrupt handling techniques, a NIC can now be allocated enough interrupt resource for each of its queues to associate to a dedicated core. Although the number of cores in CPUs continues to climb, many difficulties remain in building systems that are capable of keeping up with the packet volume in a modern middle to large scale network deployment. This is due to several factors, including the ever-increasing rate of network traffic, e.g., the now prevalent 10Gbps, the cutting-edge 40Gbps, and the upcoming 100Gbps NICs, and some fundamental limitations in both software and hardware architectures. Software imposed synchronization overheads for multi-core programming such as atomic operations and locking play a critical role affecting the packet processing performance. On the other spectrum, hardware architectural complication like cache coherency and NUMA effects brings new challenges that demand developers to equip with new skill set to unleash the real computing power. Correspondingly, researches attack these challenges by a hardware and software co-design approach that starts from investigating the underlying hardware, which collects necessary knowledge to facilitate software development and allow optimization. In this dissertation, we focus on two problems: 1) reducing the lock contentions when performing session tracking and 2) affinitizing interrupts from multi-queue NICs to CPU cores with the objective of maximizing packet processing performance. For the first problem, we propose a simple partitioning scheme aiming at striking a balance between excessive locking and lockless manipulations. Meanwhile, a resource balancing mechanism is also given to prevent the problem of underutilization of session tracking resources under circumstances of unbalanced traffic loads. The effectiveness is justified by improved performance as the number of cores that contend for a single lock decreases. On the other end of the spectrum, to address the problem of interrupt affinitization, an algorithmic approach based on numerical cost model is proposed to find the best affinitization. Comprehensive experiences covering 1G and 10G NICs with four networking applications ranging from L2 to L7 are conducted to justify the effectiveness. Huang, Nen Fu 黃能富 2015 學位論文 ; thesis 104 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
博士 === 國立清華大學 === 通訊工程研究所 === 103 === Advances in semiconductor technology are making way for multi-core and many-core processors that incorporate tens to hundreds cores in a single package. Meanwhile, network interface cards (NICs) featuring multiple hardware reception (Rx) and transmission (Tx) queues are the responses from the networking community to the prevalence of multi-core computing. Multi-queue networking circumvents the performance degradation due to contention of multi-core on a single Rx/Tx queue by distributing the packets across multiple queues. In the meantime, benefiting from evolving interrupt handling techniques, a NIC can now be allocated enough interrupt resource for each of its queues to associate to a dedicated core.
Although the number of cores in CPUs continues to climb, many difficulties remain in building systems that are capable of keeping up with the packet volume in a modern middle to large scale network deployment. This is due to several factors, including the ever-increasing rate of network traffic, e.g., the now prevalent 10Gbps, the cutting-edge 40Gbps, and the upcoming 100Gbps NICs, and some fundamental limitations in both software and hardware architectures. Software imposed synchronization overheads for multi-core programming such as atomic operations and locking play a critical role affecting the packet processing performance. On the other spectrum, hardware architectural complication like cache coherency and NUMA effects brings new challenges that demand developers to equip with new skill set to unleash the real computing power.
Correspondingly, researches attack these challenges by a hardware and software co-design approach that starts from investigating the underlying hardware, which collects necessary knowledge to facilitate software development and allow optimization. In this dissertation, we focus on two problems: 1) reducing the lock contentions when performing session tracking and 2) affinitizing interrupts from multi-queue NICs to CPU cores with the objective of maximizing packet processing performance.
For the first problem, we propose a simple partitioning scheme aiming at striking a balance between excessive locking and lockless manipulations. Meanwhile, a resource balancing mechanism is also given to prevent the problem of underutilization of session tracking resources under circumstances of unbalanced traffic loads. The effectiveness is justified by improved performance as the number of cores that contend for a single lock decreases. On the other end of the spectrum, to address the problem of interrupt affinitization, an algorithmic approach based on numerical cost model is proposed to find the best affinitization. Comprehensive experiences covering 1G and 10G NICs with four networking applications ranging from L2 to L7 are conducted to justify the effectiveness.
|
author2 |
Huang, Nen Fu |
author_facet |
Huang, Nen Fu Tsai, Wenyen 蔡文嚴 |
author |
Tsai, Wenyen 蔡文嚴 |
spellingShingle |
Tsai, Wenyen 蔡文嚴 High Performance Packet Processing on Multi-queue and Multi-core Platforms |
author_sort |
Tsai, Wenyen |
title |
High Performance Packet Processing on Multi-queue and Multi-core Platforms |
title_short |
High Performance Packet Processing on Multi-queue and Multi-core Platforms |
title_full |
High Performance Packet Processing on Multi-queue and Multi-core Platforms |
title_fullStr |
High Performance Packet Processing on Multi-queue and Multi-core Platforms |
title_full_unstemmed |
High Performance Packet Processing on Multi-queue and Multi-core Platforms |
title_sort |
high performance packet processing on multi-queue and multi-core platforms |
publishDate |
2015 |
url |
http://ndltd.ncl.edu.tw/handle/18774014002518328624 |
work_keys_str_mv |
AT tsaiwenyen highperformancepacketprocessingonmultiqueueandmulticoreplatforms AT càiwényán highperformancepacketprocessingonmultiqueueandmulticoreplatforms AT tsaiwenyen yújùbèiduōzhùlièwǎnglùkǎdeduōhéxīnpíngtáishàngduìgāoxiàonéngfēngbāochùlǐzhīyánjiū AT càiwényán yújùbèiduōzhùlièwǎnglùkǎdeduōhéxīnpíngtáishàngduìgāoxiàonéngfēngbāochùlǐzhīyánjiū |
_version_ |
1718395899206434816 |