Adaptation of MPDATA Heterogeneous Stencil Computation to Intel Xeon Phi Coprocessor

The multidimensional positive definite advection transport algorithm (MPDATA) belongs to the group of nonoscillatory forward-in-time algorithms and performs a sequence of stencil computations. MPDATA is one of the major parts of the dynamic core of the EULAG geophysical model. In this work, we outli...

Full description

Bibliographic Details
Main Authors: Lukasz Szustak, Krzysztof Rojek, Tomasz Olas, Lukasz Kuczynski, Kamil Halbiniak, Pawel Gepner
Format: Article
Language:English
Published: Hindawi Limited 2015-01-01
Series:Scientific Programming
Online Access:http://dx.doi.org/10.1155/2015/642705
id doaj-b85f21087c314769baf23c230ffe09e7
record_format Article
spelling doaj-b85f21087c314769baf23c230ffe09e72021-07-02T02:13:38ZengHindawi LimitedScientific Programming1058-92441875-919X2015-01-01201510.1155/2015/642705642705Adaptation of MPDATA Heterogeneous Stencil Computation to Intel Xeon Phi CoprocessorLukasz Szustak0Krzysztof Rojek1Tomasz Olas2Lukasz Kuczynski3Kamil Halbiniak4Pawel Gepner5Czestochowa University of Technology, Częstochowa, PolandCzestochowa University of Technology, Częstochowa, PolandCzestochowa University of Technology, Częstochowa, PolandCzestochowa University of Technology, Częstochowa, PolandCzestochowa University of Technology, Częstochowa, PolandIntel Corporation, Pipers Way, Swindon, Wiltshire SN3 1RJ, UKThe multidimensional positive definite advection transport algorithm (MPDATA) belongs to the group of nonoscillatory forward-in-time algorithms and performs a sequence of stencil computations. MPDATA is one of the major parts of the dynamic core of the EULAG geophysical model. In this work, we outline an approach to adaptation of the 3D MPDATA algorithm to the Intel MIC architecture. In order to utilize available computing resources, we propose the (3 + 1)D decomposition of MPDATA heterogeneous stencil computations. This approach is based on combination of the loop tiling and fusion techniques. It allows us to ease memory/communication bounds and better exploit the theoretical floating point efficiency of target computing platforms. An important method of improving the efficiency of the (3 + 1)D decomposition is partitioning of available cores/threads into work teams. It permits for reducing inter-cache communication overheads. This method also increases opportunities for the efficient distribution of MPDATA computation onto available resources of the Intel MIC architecture, as well as Intel CPUs. We discuss preliminary performance results obtained on two hybrid platforms, containing two CPUs and Intel Xeon Phi. The top-of-the-line Intel Xeon Phi 7120P gives the best performance results, and executes MPDATA almost 2 times faster than two Intel Xeon E5-2697v2 CPUs.http://dx.doi.org/10.1155/2015/642705
collection DOAJ
language English
format Article
sources DOAJ
author Lukasz Szustak
Krzysztof Rojek
Tomasz Olas
Lukasz Kuczynski
Kamil Halbiniak
Pawel Gepner
spellingShingle Lukasz Szustak
Krzysztof Rojek
Tomasz Olas
Lukasz Kuczynski
Kamil Halbiniak
Pawel Gepner
Adaptation of MPDATA Heterogeneous Stencil Computation to Intel Xeon Phi Coprocessor
Scientific Programming
author_facet Lukasz Szustak
Krzysztof Rojek
Tomasz Olas
Lukasz Kuczynski
Kamil Halbiniak
Pawel Gepner
author_sort Lukasz Szustak
title Adaptation of MPDATA Heterogeneous Stencil Computation to Intel Xeon Phi Coprocessor
title_short Adaptation of MPDATA Heterogeneous Stencil Computation to Intel Xeon Phi Coprocessor
title_full Adaptation of MPDATA Heterogeneous Stencil Computation to Intel Xeon Phi Coprocessor
title_fullStr Adaptation of MPDATA Heterogeneous Stencil Computation to Intel Xeon Phi Coprocessor
title_full_unstemmed Adaptation of MPDATA Heterogeneous Stencil Computation to Intel Xeon Phi Coprocessor
title_sort adaptation of mpdata heterogeneous stencil computation to intel xeon phi coprocessor
publisher Hindawi Limited
series Scientific Programming
issn 1058-9244
1875-919X
publishDate 2015-01-01
description The multidimensional positive definite advection transport algorithm (MPDATA) belongs to the group of nonoscillatory forward-in-time algorithms and performs a sequence of stencil computations. MPDATA is one of the major parts of the dynamic core of the EULAG geophysical model. In this work, we outline an approach to adaptation of the 3D MPDATA algorithm to the Intel MIC architecture. In order to utilize available computing resources, we propose the (3 + 1)D decomposition of MPDATA heterogeneous stencil computations. This approach is based on combination of the loop tiling and fusion techniques. It allows us to ease memory/communication bounds and better exploit the theoretical floating point efficiency of target computing platforms. An important method of improving the efficiency of the (3 + 1)D decomposition is partitioning of available cores/threads into work teams. It permits for reducing inter-cache communication overheads. This method also increases opportunities for the efficient distribution of MPDATA computation onto available resources of the Intel MIC architecture, as well as Intel CPUs. We discuss preliminary performance results obtained on two hybrid platforms, containing two CPUs and Intel Xeon Phi. The top-of-the-line Intel Xeon Phi 7120P gives the best performance results, and executes MPDATA almost 2 times faster than two Intel Xeon E5-2697v2 CPUs.
url http://dx.doi.org/10.1155/2015/642705
work_keys_str_mv AT lukaszszustak adaptationofmpdataheterogeneousstencilcomputationtointelxeonphicoprocessor
AT krzysztofrojek adaptationofmpdataheterogeneousstencilcomputationtointelxeonphicoprocessor
AT tomaszolas adaptationofmpdataheterogeneousstencilcomputationtointelxeonphicoprocessor
AT lukaszkuczynski adaptationofmpdataheterogeneousstencilcomputationtointelxeonphicoprocessor
AT kamilhalbiniak adaptationofmpdataheterogeneousstencilcomputationtointelxeonphicoprocessor
AT pawelgepner adaptationofmpdataheterogeneousstencilcomputationtointelxeonphicoprocessor
_version_ 1721343629452115968