Optimization of the Brillouin operator on the KNL architecture
Experiences with optimizing the matrix-times-vector application of the Brillouin operator on the Intel KNL processor are reported. Without adjustments to the memory layout, performance figures of 360 Gflop/s in single and 270 Gflop/s in double precision are observed. This is with Nc = 3 colors, Nv =...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
EDP Sciences
2018-01-01
|
Series: | EPJ Web of Conferences |
Online Access: | https://doi.org/10.1051/epjconf/201817502001 |
id |
doaj-3ebc8530c350498c9dfdb1048ceb6471 |
---|---|
record_format |
Article |
spelling |
doaj-3ebc8530c350498c9dfdb1048ceb64712021-08-02T14:44:13ZengEDP SciencesEPJ Web of Conferences2100-014X2018-01-011750200110.1051/epjconf/201817502001epjconf_lattice2018_02001Optimization of the Brillouin operator on the KNL architectureDürr StephanExperiences with optimizing the matrix-times-vector application of the Brillouin operator on the Intel KNL processor are reported. Without adjustments to the memory layout, performance figures of 360 Gflop/s in single and 270 Gflop/s in double precision are observed. This is with Nc = 3 colors, Nv = 12 right-hand-sides, Nthr = 256 threads, on lattices of size 323 × 64, using exclusively OMP pragmas. Interestingly, the same routine performs quite well on Intel Core i7 architectures, too. Some observations on the much harderWilson fermion matrix-times-vector optimization problem are added.https://doi.org/10.1051/epjconf/201817502001 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Dürr Stephan |
spellingShingle |
Dürr Stephan Optimization of the Brillouin operator on the KNL architecture EPJ Web of Conferences |
author_facet |
Dürr Stephan |
author_sort |
Dürr Stephan |
title |
Optimization of the Brillouin operator on the KNL architecture |
title_short |
Optimization of the Brillouin operator on the KNL architecture |
title_full |
Optimization of the Brillouin operator on the KNL architecture |
title_fullStr |
Optimization of the Brillouin operator on the KNL architecture |
title_full_unstemmed |
Optimization of the Brillouin operator on the KNL architecture |
title_sort |
optimization of the brillouin operator on the knl architecture |
publisher |
EDP Sciences |
series |
EPJ Web of Conferences |
issn |
2100-014X |
publishDate |
2018-01-01 |
description |
Experiences with optimizing the matrix-times-vector application of the Brillouin operator on the Intel KNL processor are reported. Without adjustments to the memory layout, performance figures of 360 Gflop/s in single and 270 Gflop/s in double precision are observed. This is with Nc = 3 colors, Nv = 12 right-hand-sides, Nthr = 256 threads, on lattices of size 323 × 64, using exclusively OMP pragmas. Interestingly, the same routine performs quite well on Intel Core i7 architectures, too. Some observations on the much harderWilson fermion matrix-times-vector optimization problem are added. |
url |
https://doi.org/10.1051/epjconf/201817502001 |
work_keys_str_mv |
AT durrstephan optimizationofthebrillouinoperatorontheknlarchitecture |
_version_ |
1721230981237571584 |