Summary: | To accelerate deep learning (DL) processes on the supercomputer Fugaku, the authors have ported and optimized oneDNN for Fugaku's CPU, the Fujitsu A64FX. oneDNN is an open-source DL processing library developed by Intel for the x86 64 architecture. The A64FX CPU is based on the Armv8-A architecture. oneDNN dynamically creates the execution code for the computation kernels, which are implemented at the granularity of x86 64 instructions using Xbyak, the Just-In-Time (JIT) assembler for x86 64 architecture. To port oneDNN to A64FX, it must be rewritten into Armv8-A instructions using Xbyak aarch64, the JIT assembler for the Armv8-A architecture. This is challenging because the number of steps to be rewritten exceeds several tens of thousands of lines. This study presents the Xbyak translator aarch64. Xbyak translator aarch64 is a binary translator that at runtime converts dynamically produced executable codes for the x86 64 architecture into executable codes for the Armv8-A architecture. Xbyak translator aarch64 eliminates the need to rewrite the source code for porting oneDNN to A64FX and allows us to port oneDNN to A64FX quickly. Copyright © 2022 The Institute of Electronics, Information and Communication Engineers.
|