Thread-Aware Mechanism to Enhance Inter-Node Load Balancing for Multithreaded Applications on NUMA Systems

NUMA multi-core systems divide system resources into several nodes. When an imbalance in the load between cores occurs, the kernel scheduler’s load balancing mechanism then migrates threads between cores or across NUMA nodes. Remote memory access is required for a thread to access memory on the prev...

Full description

Bibliographic Details
Main Authors: Mei-Ling Chiang, Wei-Lun Su
Format: Article
Language:English
Published: MDPI AG 2021-07-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/11/14/6486
id doaj-b00008107ac4495c9143083a04e2dfcc
record_format Article
spelling doaj-b00008107ac4495c9143083a04e2dfcc2021-07-23T13:29:46ZengMDPI AGApplied Sciences2076-34172021-07-01116486648610.3390/app11146486Thread-Aware Mechanism to Enhance Inter-Node Load Balancing for Multithreaded Applications on NUMA SystemsMei-Ling Chiang0Wei-Lun Su1Department of Information Management, National Chi Nan University, Puli 54516, TaiwanDepartment of Information Management, National Chi Nan University, Puli 54516, TaiwanNUMA multi-core systems divide system resources into several nodes. When an imbalance in the load between cores occurs, the kernel scheduler’s load balancing mechanism then migrates threads between cores or across NUMA nodes. Remote memory access is required for a thread to access memory on the previous node, which degrades performance. Threads to be migrated must be selected effectively and efficiently since the related operations run in the critical path of the kernel scheduler. This study focuses on improving inter-node load balancing for multithreaded applications. We propose a thread-aware selection policy that considers the distribution of threads on nodes for each thread group while migrating one thread for inter-node load balancing. The thread is selected for which its thread group has the least exclusive thread distribution, and thread members are distributed more evenly on nodes. This has less influence on data mapping and thread mapping for the thread group. We further devise several enhancements to eliminate superfluous evaluations for multithreaded processes, so the selection procedure is more efficient. The experimental results for the commonly used PARSEC 3.0 benchmark suite show that the modified Linux kernel with the proposed selection policy increases performance by 10.7% compared with the unmodified Linux kernel.https://www.mdpi.com/2076-3417/11/14/6486NUMALinux kernelmultithreadedload balancingremote memory access
collection DOAJ
language English
format Article
sources DOAJ
author Mei-Ling Chiang
Wei-Lun Su
spellingShingle Mei-Ling Chiang
Wei-Lun Su
Thread-Aware Mechanism to Enhance Inter-Node Load Balancing for Multithreaded Applications on NUMA Systems
Applied Sciences
NUMA
Linux kernel
multithreaded
load balancing
remote memory access
author_facet Mei-Ling Chiang
Wei-Lun Su
author_sort Mei-Ling Chiang
title Thread-Aware Mechanism to Enhance Inter-Node Load Balancing for Multithreaded Applications on NUMA Systems
title_short Thread-Aware Mechanism to Enhance Inter-Node Load Balancing for Multithreaded Applications on NUMA Systems
title_full Thread-Aware Mechanism to Enhance Inter-Node Load Balancing for Multithreaded Applications on NUMA Systems
title_fullStr Thread-Aware Mechanism to Enhance Inter-Node Load Balancing for Multithreaded Applications on NUMA Systems
title_full_unstemmed Thread-Aware Mechanism to Enhance Inter-Node Load Balancing for Multithreaded Applications on NUMA Systems
title_sort thread-aware mechanism to enhance inter-node load balancing for multithreaded applications on numa systems
publisher MDPI AG
series Applied Sciences
issn 2076-3417
publishDate 2021-07-01
description NUMA multi-core systems divide system resources into several nodes. When an imbalance in the load between cores occurs, the kernel scheduler’s load balancing mechanism then migrates threads between cores or across NUMA nodes. Remote memory access is required for a thread to access memory on the previous node, which degrades performance. Threads to be migrated must be selected effectively and efficiently since the related operations run in the critical path of the kernel scheduler. This study focuses on improving inter-node load balancing for multithreaded applications. We propose a thread-aware selection policy that considers the distribution of threads on nodes for each thread group while migrating one thread for inter-node load balancing. The thread is selected for which its thread group has the least exclusive thread distribution, and thread members are distributed more evenly on nodes. This has less influence on data mapping and thread mapping for the thread group. We further devise several enhancements to eliminate superfluous evaluations for multithreaded processes, so the selection procedure is more efficient. The experimental results for the commonly used PARSEC 3.0 benchmark suite show that the modified Linux kernel with the proposed selection policy increases performance by 10.7% compared with the unmodified Linux kernel.
topic NUMA
Linux kernel
multithreaded
load balancing
remote memory access
url https://www.mdpi.com/2076-3417/11/14/6486
work_keys_str_mv AT meilingchiang threadawaremechanismtoenhanceinternodeloadbalancingformultithreadedapplicationsonnumasystems
AT weilunsu threadawaremechanismtoenhanceinternodeloadbalancingformultithreadedapplicationsonnumasystems
_version_ 1721289578027941888