A Binary Translator to Accelerate Development of Deep Learning Processing Library for AArch64 CPU

To accelerate deep learning (DL) processes on the supercomputer Fugaku, the authors have ported and optimized oneDNN for Fugaku's CPU, the Fujitsu A64FX. oneDNN is an open-source DL processing library developed by Intel for the x86 64 architecture. The A64FX CPU is based on the Armv8-A architec...

Full description

Bibliographic Details
Main Authors: Fukumoto, N. (Author), Honda, T. (Author), Kawakami, K. (Author), Kurihara, K. (Author), Yamazaki, M. (Author)
Format: Article
Language:English
Published: Institute of Electronics Information Communication Engineers 2022
Subjects:
Online Access:View Fulltext in Publisher
Description
Summary:To accelerate deep learning (DL) processes on the supercomputer Fugaku, the authors have ported and optimized oneDNN for Fugaku's CPU, the Fujitsu A64FX. oneDNN is an open-source DL processing library developed by Intel for the x86 64 architecture. The A64FX CPU is based on the Armv8-A architecture. oneDNN dynamically creates the execution code for the computation kernels, which are implemented at the granularity of x86 64 instructions using Xbyak, the Just-In-Time (JIT) assembler for x86 64 architecture. To port oneDNN to A64FX, it must be rewritten into Armv8-A instructions using Xbyak aarch64, the JIT assembler for the Armv8-A architecture. This is challenging because the number of steps to be rewritten exceeds several tens of thousands of lines. This study presents the Xbyak translator aarch64. Xbyak translator aarch64 is a binary translator that at runtime converts dynamically produced executable codes for the x86 64 architecture into executable codes for the Armv8-A architecture. Xbyak translator aarch64 eliminates the need to rewrite the source code for porting oneDNN to A64FX and allows us to port oneDNN to A64FX quickly. Copyright © 2022 The Institute of Electronics, Information and Communication Engineers.
ISBN:09168524 (ISSN)
DOI:10.1587/TRANSELE.2021LHP0001