Summary: | 碩士 === 國立暨南國際大學 === 資訊管理學系 === 107 === NUMA multi-core systems divide the system resources into several nodes and thus are more scalable. When load imbalance among cores occurs, the load balancing mechanism of the kernel scheduler is triggered to migrate processes between cores, even across NUMA nodes. After the inter-node migration, remote memory access may incur, and it degrades system performance. To maintain load balance as well as to reduce remote memory access, previous research proposed the kernel-level Memory-aware Load Balancing (kMLB) mechanism to enhance the inter-node load balancing of the Linux kernel. It tracks the number of memory pages occupied by each task on each NUMA node and devises several task selection policies. These policies use this information to select the most suitable task that may reduce the most remote memory access after the inter-node migration.
In this study, we focus on the issue of inter-node load balancing for multi-threaded processes. In Linux kernel, threads of one multi-threaded process form a thread group and share the memory space. However, threads of one multi-threaded process may be scheduled to run on different NUMA nodes, which may incur different amounts of possible remote memory access. In this study, we find out that the previously proposed Most Benefit policy using kMLB mechanism is also appropriate for multi-threaded processes. Besides, a new task selection policy that does not require kMLB mechanism is proposed, which considers the threads’distribution on each NUMA node for each movable task’s thread group. The task whose thread group with the least exclusivity of thread distribution is selected. It is expected to incur the less influence on the data mapping and thread mapping toward its thread group.
On the other hand, though selecting suitable tasks for inter-node migration can reduce remote memory access, the load balancer has to evaluate each movable task in the runqueue, which thus incurs certain overhead. We further use some methods to skip superfluous evaluations for multi-threaded processes and make the selecting procedure more efficient. The experiment results with the popularly used PARSEC 3.0 Benchmark Suite show that our modified Linux kernel using various task selection policies can obtain up to 11.1% performance improvement over the unmodified Linux kernel.
|