The GPU version of LASG/IAP Climate System Ocean Model version 3 (LICOM3) under the heterogeneous-compute interface for portability (HIP) framework and its large-scale application

<p>A high-resolution (<span class="inline-formula"><math xmlns="http://www.w3.org/1998/Math/MathML" id="M1" display="inline" overflow="scroll" dspmath="mathml"><mrow><mn mathvariant="normal">1</mn&...

Full description

Bibliographic Details
Main Authors: P. Wang, J. Jiang, P. Lin, M. Ding, J. Wei, F. Zhang, L. Zhao, Y. Li, Z. Yu, W. Zheng, Y. Yu, X. Chi, H. Liu
Format: Article
Language:English
Published: Copernicus Publications 2021-05-01
Series:Geoscientific Model Development
Online Access:https://gmd.copernicus.org/articles/14/2781/2021/gmd-14-2781-2021.pdf
Description
Summary:<p>A high-resolution (<span class="inline-formula"><math xmlns="http://www.w3.org/1998/Math/MathML" id="M1" display="inline" overflow="scroll" dspmath="mathml"><mrow><mn mathvariant="normal">1</mn><mo>/</mo><mn mathvariant="normal">20</mn><msup><mi/><mo>∘</mo></msup></mrow></math><span><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="31pt" height="14pt" class="svg-formula" dspmath="mathimg" md5hash="373dde63ac63417a7e30c3aa9e00e973"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="gmd-14-2781-2021-ie00001.svg" width="31pt" height="14pt" src="gmd-14-2781-2021-ie00001.png"/></svg:svg></span></span>) global ocean general circulation model with graphics processing unit (GPU) code implementations is developed based on the LASG/IAP Climate System Ocean Model version 3 (LICOM3) under a heterogeneous-compute interface for portability (HIP) framework. The dynamic core and physics package of LICOM3 are both ported to the GPU, and three-dimensional parallelization (also partitioned in the vertical direction) is applied. The HIP version of LICOM3 (LICOM3-HIP) is 42 times faster than the same number of CPU cores when 384 AMD GPUs and CPU cores are used. LICOM3-HIP has excellent scalability; it can still obtain a speedup of more than 4 on 9216 <span class="inline-formula">GPUs</span> compared to 384 <span class="inline-formula">GPUs</span>. In this phase, we successfully performed a test of <span class="inline-formula"><math xmlns="http://www.w3.org/1998/Math/MathML" id="M4" display="inline" overflow="scroll" dspmath="mathml"><mrow><mn mathvariant="normal">1</mn><mo>/</mo><mn mathvariant="normal">20</mn><msup><mi/><mo>∘</mo></msup></mrow></math><span><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="31pt" height="14pt" class="svg-formula" dspmath="mathimg" md5hash="0cf7697ff56d2784b0bb3e419043e427"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="gmd-14-2781-2021-ie00002.svg" width="31pt" height="14pt" src="gmd-14-2781-2021-ie00002.png"/></svg:svg></span></span> LICOM3-HIP using 6550 nodes and 26 200 <span class="inline-formula">GPUs</span>, and on a large scale, the model's speed was increased to approximately 2.72 simulated years per day (SYPD). By putting almost all the computation processes inside GPUs, the time cost of data transfer between CPUs and GPUs was reduced, resulting in high performance. Simultaneously, a 14-year spin-up integration following phase 2 of the Ocean Model Intercomparison Project (OMIP-2) protocol of surface forcing was performed, and preliminary results were evaluated. We found that the model results had little difference from the CPU version. Further comparison with observations and lower-resolution LICOM3 results suggests that the <span class="inline-formula"><math xmlns="http://www.w3.org/1998/Math/MathML" id="M6" display="inline" overflow="scroll" dspmath="mathml"><mrow><mn mathvariant="normal">1</mn><mo>/</mo><mn mathvariant="normal">20</mn><msup><mi/><mo>∘</mo></msup></mrow></math><span><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="31pt" height="14pt" class="svg-formula" dspmath="mathimg" md5hash="cb58d9ccf1e27ce2a56318f89fca3ad5"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="gmd-14-2781-2021-ie00003.svg" width="31pt" height="14pt" src="gmd-14-2781-2021-ie00003.png"/></svg:svg></span></span> LICOM3-HIP can reproduce the observations and produce many smaller-scale activities, such as submesoscale eddies and frontal-scale structures.</p>
ISSN:1991-959X
1991-9603