Optimization of the Brillouin operator on the KNL architecture

Experiences with optimizing the matrix-times-vector application of the Brillouin operator on the Intel KNL processor are reported. Without adjustments to the memory layout, performance figures of 360 Gflop/s in single and 270 Gflop/s in double precision are observed. This is with Nc = 3 colors, Nv =...

Full description

Bibliographic Details
Main Author:	Dürr Stephan
Format:	Article
Language:	English
Published:	EDP Sciences 2018-01-01
Series:	EPJ Web of Conferences
Online Access:	https://doi.org/10.1051/epjconf/201817502001

Description
Summary:	Experiences with optimizing the matrix-times-vector application of the Brillouin operator on the Intel KNL processor are reported. Without adjustments to the memory layout, performance figures of 360 Gflop/s in single and 270 Gflop/s in double precision are observed. This is with Nc = 3 colors, Nv = 12 right-hand-sides, Nthr = 256 threads, on lattices of size 323 × 64, using exclusively OMP pragmas. Interestingly, the same routine performs quite well on Intel Core i7 architectures, too. Some observations on the much harderWilson fermion matrix-times-vector optimization problem are added.
ISSN:	2100-014X

Optimization of the Brillouin operator on the KNL architecture

Similar Items