EQueue: Elastic Lock-Free FIFO Queue for Core-to-Core Communication on Multi-Core Processors

In recent years, the number of CPU cores in a multi-core processor keeps increasing. To leverage the increasing hardware resource, programmers need to develop parallelized software programs. One promising approach to parallelizing high-performance applications is pipeline parallelism, which divides...

Full description

Bibliographic Details
Main Authors: Junchang Wang, Yangfeng Tian, Xiong Fu
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9099229/
Description
Summary:In recent years, the number of CPU cores in a multi-core processor keeps increasing. To leverage the increasing hardware resource, programmers need to develop parallelized software programs. One promising approach to parallelizing high-performance applications is pipeline parallelism, which divides a task into a serial of subtasks and then maps these subtasks to a group of CPU cores, making the communication scheme between the subtasks running on different cores a critical component for the parallelized programs. One widely-used implementation of the communication scheme is software-based, lock-free first-in-first-out queues that move data between different subtasks. The primary design goal of the prior lock-free queues was higher throughput, such that the technique of batching data was heavily used in their enqueue and dequeue operations. Unfortunately, a lock-free queue with batching heavily depends on the assumption that data arrive at a constant rate, and the queue is in an equilibrium state. Experimentally we found that the equilibrium state of a queue rarely happens in real, high-performance use cases (e.g., 10Gbps+ network applications) because data arriving rate fluctuates sharply. As a result, existing queues suffer from performance degradation when used in real applications on multi-core processors. In this paper, we present EQueue, a lock-free queue to handle this robustness issue in existing queues. EQueue is lock-free, efficient, and robust. EQueue can adaptively (1) shrink its queue size when data arriving rate is low to keep its memory footprint small to utilize CPU cache better, and (2) enlarge its queue size to avoid overflow when data arriving rate is in burstiness. Experimental results show that when used in high-performance applications, EQueue can always perform an enqueue/dequeue operation in less than 50 CPU cycles, which outperforms FastForward and MCRingBuffer, two state-of-the-art queues, by factors 3 and 2, respectively.
ISSN:2169-3536