Optimization of the Brillouin operator on the KNL architecture

Experiences with optimizing the matrix-times-vector application of the Brillouin operator on the Intel KNL processor are reported. Without adjustments to the memory layout, performance figures of 360 Gflop/s in single and 270 Gflop/s in double precision are observed. This is with Nc = 3 colors, Nv =...

Full description

Bibliographic Details
Main Author: Dürr Stephan
Format: Article
Language:English
Published: EDP Sciences 2018-01-01
Series:EPJ Web of Conferences
Online Access:https://doi.org/10.1051/epjconf/201817502001
id doaj-3ebc8530c350498c9dfdb1048ceb6471
record_format Article
spelling doaj-3ebc8530c350498c9dfdb1048ceb64712021-08-02T14:44:13ZengEDP SciencesEPJ Web of Conferences2100-014X2018-01-011750200110.1051/epjconf/201817502001epjconf_lattice2018_02001Optimization of the Brillouin operator on the KNL architectureDürr StephanExperiences with optimizing the matrix-times-vector application of the Brillouin operator on the Intel KNL processor are reported. Without adjustments to the memory layout, performance figures of 360 Gflop/s in single and 270 Gflop/s in double precision are observed. This is with Nc = 3 colors, Nv = 12 right-hand-sides, Nthr = 256 threads, on lattices of size 323 × 64, using exclusively OMP pragmas. Interestingly, the same routine performs quite well on Intel Core i7 architectures, too. Some observations on the much harderWilson fermion matrix-times-vector optimization problem are added.https://doi.org/10.1051/epjconf/201817502001
collection DOAJ
language English
format Article
sources DOAJ
author Dürr Stephan
spellingShingle Dürr Stephan
Optimization of the Brillouin operator on the KNL architecture
EPJ Web of Conferences
author_facet Dürr Stephan
author_sort Dürr Stephan
title Optimization of the Brillouin operator on the KNL architecture
title_short Optimization of the Brillouin operator on the KNL architecture
title_full Optimization of the Brillouin operator on the KNL architecture
title_fullStr Optimization of the Brillouin operator on the KNL architecture
title_full_unstemmed Optimization of the Brillouin operator on the KNL architecture
title_sort optimization of the brillouin operator on the knl architecture
publisher EDP Sciences
series EPJ Web of Conferences
issn 2100-014X
publishDate 2018-01-01
description Experiences with optimizing the matrix-times-vector application of the Brillouin operator on the Intel KNL processor are reported. Without adjustments to the memory layout, performance figures of 360 Gflop/s in single and 270 Gflop/s in double precision are observed. This is with Nc = 3 colors, Nv = 12 right-hand-sides, Nthr = 256 threads, on lattices of size 323 × 64, using exclusively OMP pragmas. Interestingly, the same routine performs quite well on Intel Core i7 architectures, too. Some observations on the much harderWilson fermion matrix-times-vector optimization problem are added.
url https://doi.org/10.1051/epjconf/201817502001
work_keys_str_mv AT durrstephan optimizationofthebrillouinoperatorontheknlarchitecture
_version_ 1721230981237571584