Optimization of Computations on Related Sets in CUDA

<p>Many algorithms for solving problems of analysis and synthesis of complex system structures have a large margin of internal parallelism. However, in their implementation in a specific parallel computing system an acceleration factor can be ultra low. First of all, such behaviour is because...

Full description

Bibliographic Details
Main Authors:	G. S. Ivanova, A. A. Golovkov
Format:	Article
Language:	Russian
Published:	MGTU im. N.È. Baumana 2015-01-01
Series:	Nauka i Obrazovanie
Subjects:	operation on graph parallel algorithm CUDA optimization thread mutex data structure
Online Access:	http://technomag.edu.ru/jour/article/view/160

id	doaj-ea170211e3f14705a803ef47a068c092
record_format	Article
spelling	doaj-ea170211e3f14705a803ef47a068c0922020-11-24T22:27:14ZrusMGTU im. N.È. BaumanaNauka i Obrazovanie1994-04082015-01-0101027128710.7463/1015.0820521160Optimization of Computations on Related Sets in CUDAG. S. Ivanova0A. A. Golovkov1Bauman Moscow State Technical UniversityBauman Moscow State Technical University<p>Many algorithms for solving problems of analysis and synthesis of complex system structures have a large margin of internal parallelism. However, in their implementation in a specific parallel computing system an acceleration factor can be ultra low. First of all, such behaviour is because of hardware and software features of the computer system. This article focuses on identifying the factors that leave less room for achieving the calculated coefficients of acceleration of parallel algorithms for operations on the graph models when using GPU CUDA, and in the development of recommendations, which, if to follow them, result in downtime.</p><p>This article considers a model program implementation in CUDA, defines the nature of the flow implementation, and offers 2 methods of planning the flows: the iterative execution of parallel function and iterative code execution in a parallel function. In order to develop effective data structures was carried out analysis of the features of memory in CUDA. We studied different memory allocation algorithms and developed CUDA-implemented critical section, which does not cause a deadlock of flows. Based on the results recommendations have been formulated for the effective use of the parallelization potential in CUDA.</p><p>Within the framework of practical test of the proposed methods and rules, has been developed a data structure to represent graphs, which is based on the CUDA principles, as well as have been offered several options to implement the parallel algorithm of the vertex decomposition operation: without using a critical section in the flows of iterative execution of a parallel function; without using a critical section in the flow of iterative execution of code within the parallel functions; using a critical section in the flows of iterative execution of code within the parallel function. Based on the analysis of experimental results it was concluded that the developed recommendations turned to be useful and using the CUDA when implementing operations on the graphs in large-scale problems was highly efficient.</p>http://technomag.edu.ru/jour/article/view/160operation on graphparallel algorithmCUDAoptimizationthreadmutexdata structure
collection	DOAJ
language	Russian
format	Article
sources	DOAJ
author	G. S. Ivanova A. A. Golovkov
spellingShingle	G. S. Ivanova A. A. Golovkov Optimization of Computations on Related Sets in CUDA Nauka i Obrazovanie operation on graph parallel algorithm CUDA optimization thread mutex data structure
author_facet	G. S. Ivanova A. A. Golovkov
author_sort	G. S. Ivanova
title	Optimization of Computations on Related Sets in CUDA
title_short	Optimization of Computations on Related Sets in CUDA
title_full	Optimization of Computations on Related Sets in CUDA
title_fullStr	Optimization of Computations on Related Sets in CUDA
title_full_unstemmed	Optimization of Computations on Related Sets in CUDA
title_sort	optimization of computations on related sets in cuda
publisher	MGTU im. N.È. Baumana
series	Nauka i Obrazovanie
issn	1994-0408
publishDate	2015-01-01
description	<p>Many algorithms for solving problems of analysis and synthesis of complex system structures have a large margin of internal parallelism. However, in their implementation in a specific parallel computing system an acceleration factor can be ultra low. First of all, such behaviour is because of hardware and software features of the computer system. This article focuses on identifying the factors that leave less room for achieving the calculated coefficients of acceleration of parallel algorithms for operations on the graph models when using GPU CUDA, and in the development of recommendations, which, if to follow them, result in downtime.</p><p>This article considers a model program implementation in CUDA, defines the nature of the flow implementation, and offers 2 methods of planning the flows: the iterative execution of parallel function and iterative code execution in a parallel function. In order to develop effective data structures was carried out analysis of the features of memory in CUDA. We studied different memory allocation algorithms and developed CUDA-implemented critical section, which does not cause a deadlock of flows. Based on the results recommendations have been formulated for the effective use of the parallelization potential in CUDA.</p><p>Within the framework of practical test of the proposed methods and rules, has been developed a data structure to represent graphs, which is based on the CUDA principles, as well as have been offered several options to implement the parallel algorithm of the vertex decomposition operation: without using a critical section in the flows of iterative execution of a parallel function; without using a critical section in the flow of iterative execution of code within the parallel functions; using a critical section in the flows of iterative execution of code within the parallel function. Based on the analysis of experimental results it was concluded that the developed recommendations turned to be useful and using the CUDA when implementing operations on the graphs in large-scale problems was highly efficient.</p>
topic	operation on graph parallel algorithm CUDA optimization thread mutex data structure
url	http://technomag.edu.ru/jour/article/view/160
work_keys_str_mv	AT gsivanova optimizationofcomputationsonrelatedsetsincuda AT aagolovkov optimizationofcomputationsonrelatedsetsincuda
_version_	1725750745190367232

Optimization of Computations on Related Sets in CUDA

Similar Items