Optimization of the Brillouin operator on the KNL architecture

Experiences with optimizing the matrix-times-vector application of the Brillouin operator on the Intel KNL processor are reported. Without adjustments to the memory layout, performance figures of 360 Gflop/s in single and 270 Gflop/s in double precision are observed. This is with Nc = 3 colors, Nv =...

Full description

Bibliographic Details
Main Author:	Dürr Stephan
Format:	Article
Language:	English
Published:	EDP Sciences 2018-01-01
Series:	EPJ Web of Conferences
Online Access:	https://doi.org/10.1051/epjconf/201817502001

id	doaj-3ebc8530c350498c9dfdb1048ceb6471
record_format	Article
spelling	doaj-3ebc8530c350498c9dfdb1048ceb64712021-08-02T14:44:13ZengEDP SciencesEPJ Web of Conferences2100-014X2018-01-011750200110.1051/epjconf/201817502001epjconf_lattice2018_02001Optimization of the Brillouin operator on the KNL architectureDürr StephanExperiences with optimizing the matrix-times-vector application of the Brillouin operator on the Intel KNL processor are reported. Without adjustments to the memory layout, performance figures of 360 Gflop/s in single and 270 Gflop/s in double precision are observed. This is with Nc = 3 colors, Nv = 12 right-hand-sides, Nthr = 256 threads, on lattices of size 323 × 64, using exclusively OMP pragmas. Interestingly, the same routine performs quite well on Intel Core i7 architectures, too. Some observations on the much harderWilson fermion matrix-times-vector optimization problem are added.https://doi.org/10.1051/epjconf/201817502001
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Dürr Stephan
spellingShingle	Dürr Stephan Optimization of the Brillouin operator on the KNL architecture EPJ Web of Conferences
author_facet	Dürr Stephan
author_sort	Dürr Stephan
title	Optimization of the Brillouin operator on the KNL architecture
title_short	Optimization of the Brillouin operator on the KNL architecture
title_full	Optimization of the Brillouin operator on the KNL architecture
title_fullStr	Optimization of the Brillouin operator on the KNL architecture
title_full_unstemmed	Optimization of the Brillouin operator on the KNL architecture
title_sort	optimization of the brillouin operator on the knl architecture
publisher	EDP Sciences
series	EPJ Web of Conferences
issn	2100-014X
publishDate	2018-01-01
description	Experiences with optimizing the matrix-times-vector application of the Brillouin operator on the Intel KNL processor are reported. Without adjustments to the memory layout, performance figures of 360 Gflop/s in single and 270 Gflop/s in double precision are observed. This is with Nc = 3 colors, Nv = 12 right-hand-sides, Nthr = 256 threads, on lattices of size 323 × 64, using exclusively OMP pragmas. Interestingly, the same routine performs quite well on Intel Core i7 architectures, too. Some observations on the much harderWilson fermion matrix-times-vector optimization problem are added.
url	https://doi.org/10.1051/epjconf/201817502001
work_keys_str_mv	AT durrstephan optimizationofthebrillouinoperatorontheknlarchitecture
_version_	1721230981237571584

Optimization of the Brillouin operator on the KNL architecture

Similar Items