VECTORIZATION OF OPERATIONS ON SMALL- DIMENSIONAL MATRICES FOR INTEL XEON PHI KNIGHTS LANDING PROCESSOR

The article is devoted to the vectorization of calculations for Intel Xeon Phi Knights Landing (KNL) processor. Small-dimensional matrices are considered as objects for optimization. These operations are wide common in calculation codes in various scopes of research, for example, in calculational fl...

Full description

Bibliographic Details
Main Authors:	Leonid A. Benderskiy, Sergey A. Leshchev, Alexey A. Rybakov
Format:	Article
Language:	Russian
Published:	The Fund for Promotion of Internet media, IT education, human development «League Internet Media» 2018-03-01
Series:	Современные информационные технологии и IT-образование
Subjects:	Matrix operations vectorization KNL AVX-512 intrinsic functions
Online Access:	http://sitito.cs.msu.ru/index.php/SITITO/article/view/343

id	doaj-e0c01c62c3fe49288db4d120c2ac15fd
record_format	Article
spelling	doaj-e0c01c62c3fe49288db4d120c2ac15fd2020-12-02T12:11:45ZrusThe Fund for Promotion of Internet media, IT education, human development «League Internet Media»Современные информационные технологии и IT-образование2411-14732018-03-01141739010.25559/SITITO.14.201801.073-090VECTORIZATION OF OPERATIONS ON SMALL- DIMENSIONAL MATRICES FOR INTEL XEON PHI KNIGHTS LANDING PROCESSORLeonid A. Benderskiy0Sergey A. Leshchev1Alexey A. Rybakov2Scientific Research Institute for System Analysis of the Russian Academy of Sciences, SRISA, Moscow, RussiaScientific Research Institute for System Analysis of the Russian Academy of Sciences, SRISA, Moscow, RussiaScientific Research Institute for System Analysis of the Russian Academy of Sciences, SRISA, Moscow, RussiaThe article is devoted to the vectorization of calculations for Intel Xeon Phi Knights Landing (KNL) processor. Small-dimensional matrices are considered as objects for optimization. These operations are wide common in calculation codes in various scopes of research, for example, in calculational fluid dynamics. KNL is the latter Intel Xeon Phi processor, it contains up to 72 calculational cores and allows running applications using massive parallelism. They implement wide range of opportunities for effective performance of supercomputer calculations. In particular, they support different memory and cluster modes. In many cases the compiler isn't able to create high-performance parallel vectorized execution code. This leads to performance losses. One of the reserves of improving code performance is the manual vectorization of the hot blocks of the code. This leads to the entire application acceleration. An important step in the program optimizing when using KNL processors is applying special 512-bit vector instructions that can significantly increase the speed of the execution code. Using of 512-bit vector instructions allows processing vectors consisting of 16 floating-point values. Special fused multiply-add instructions allow us to combine operations of componentwise multiplication and addition of these vectors. For simplification of the manual vectorization of the program code, special intrinsic functions are used. In fact these functions are just wrappers over the processor instructions. Vectorization of operations on matrices, performed with the intrinsic functions, made it possible to reduce the execution time of these operations in the range from 23% to 70% in comparison with the version compiled by the Intel compiler with the maximum level of optimization. The results received show additional hidden performance reserves of applications that can be obtained by manual optimization of the source code. http://sitito.cs.msu.ru/index.php/SITITO/article/view/343Matrix operationsvectorizationKNLAVX-512intrinsic functions
collection	DOAJ
language	Russian
format	Article
sources	DOAJ
author	Leonid A. Benderskiy Sergey A. Leshchev Alexey A. Rybakov
spellingShingle	Leonid A. Benderskiy Sergey A. Leshchev Alexey A. Rybakov VECTORIZATION OF OPERATIONS ON SMALL- DIMENSIONAL MATRICES FOR INTEL XEON PHI KNIGHTS LANDING PROCESSOR Современные информационные технологии и IT-образование Matrix operations vectorization KNL AVX-512 intrinsic functions
author_facet	Leonid A. Benderskiy Sergey A. Leshchev Alexey A. Rybakov
author_sort	Leonid A. Benderskiy
title	VECTORIZATION OF OPERATIONS ON SMALL- DIMENSIONAL MATRICES FOR INTEL XEON PHI KNIGHTS LANDING PROCESSOR
title_short	VECTORIZATION OF OPERATIONS ON SMALL- DIMENSIONAL MATRICES FOR INTEL XEON PHI KNIGHTS LANDING PROCESSOR
title_full	VECTORIZATION OF OPERATIONS ON SMALL- DIMENSIONAL MATRICES FOR INTEL XEON PHI KNIGHTS LANDING PROCESSOR
title_fullStr	VECTORIZATION OF OPERATIONS ON SMALL- DIMENSIONAL MATRICES FOR INTEL XEON PHI KNIGHTS LANDING PROCESSOR
title_full_unstemmed	VECTORIZATION OF OPERATIONS ON SMALL- DIMENSIONAL MATRICES FOR INTEL XEON PHI KNIGHTS LANDING PROCESSOR
title_sort	vectorization of operations on small- dimensional matrices for intel xeon phi knights landing processor
publisher	The Fund for Promotion of Internet media, IT education, human development «League Internet Media»
series	Современные информационные технологии и IT-образование
issn	2411-1473
publishDate	2018-03-01
description	The article is devoted to the vectorization of calculations for Intel Xeon Phi Knights Landing (KNL) processor. Small-dimensional matrices are considered as objects for optimization. These operations are wide common in calculation codes in various scopes of research, for example, in calculational fluid dynamics. KNL is the latter Intel Xeon Phi processor, it contains up to 72 calculational cores and allows running applications using massive parallelism. They implement wide range of opportunities for effective performance of supercomputer calculations. In particular, they support different memory and cluster modes. In many cases the compiler isn't able to create high-performance parallel vectorized execution code. This leads to performance losses. One of the reserves of improving code performance is the manual vectorization of the hot blocks of the code. This leads to the entire application acceleration. An important step in the program optimizing when using KNL processors is applying special 512-bit vector instructions that can significantly increase the speed of the execution code. Using of 512-bit vector instructions allows processing vectors consisting of 16 floating-point values. Special fused multiply-add instructions allow us to combine operations of componentwise multiplication and addition of these vectors. For simplification of the manual vectorization of the program code, special intrinsic functions are used. In fact these functions are just wrappers over the processor instructions. Vectorization of operations on matrices, performed with the intrinsic functions, made it possible to reduce the execution time of these operations in the range from 23% to 70% in comparison with the version compiled by the Intel compiler with the maximum level of optimization. The results received show additional hidden performance reserves of applications that can be obtained by manual optimization of the source code.
topic	Matrix operations vectorization KNL AVX-512 intrinsic functions
url	http://sitito.cs.msu.ru/index.php/SITITO/article/view/343
work_keys_str_mv	AT leonidabenderskiy vectorizationofoperationsonsmalldimensionalmatricesforintelxeonphiknightslandingprocessor AT sergeyaleshchev vectorizationofoperationsonsmalldimensionalmatricesforintelxeonphiknightslandingprocessor AT alexeyarybakov vectorizationofoperationsonsmalldimensionalmatricesforintelxeonphiknightslandingprocessor
_version_	1724406929615224832

VECTORIZATION OF OPERATIONS ON SMALL- DIMENSIONAL MATRICES FOR INTEL XEON PHI KNIGHTS LANDING PROCESSOR

Similar Items