Optimization of Computations on Related Sets in CUDA
<p>Many algorithms for solving problems of analysis and synthesis of complex system structures have a large margin of internal parallelism. However, in their implementation in a specific parallel computing system an acceleration factor can be ultra low. First of all, such behaviour is because...
Main Authors: | , |
---|---|
Format: | Article |
Language: | Russian |
Published: |
MGTU im. N.È. Baumana
2015-01-01
|
Series: | Nauka i Obrazovanie |
Subjects: | |
Online Access: | http://technomag.edu.ru/jour/article/view/160 |
id |
doaj-ea170211e3f14705a803ef47a068c092 |
---|---|
record_format |
Article |
spelling |
doaj-ea170211e3f14705a803ef47a068c0922020-11-24T22:27:14ZrusMGTU im. N.È. BaumanaNauka i Obrazovanie1994-04082015-01-0101027128710.7463/1015.0820521160Optimization of Computations on Related Sets in CUDAG. S. Ivanova0A. A. Golovkov1Bauman Moscow State Technical UniversityBauman Moscow State Technical University<p>Many algorithms for solving problems of analysis and synthesis of complex system structures have a large margin of internal parallelism. However, in their implementation in a specific parallel computing system an acceleration factor can be ultra low. First of all, such behaviour is because of hardware and software features of the computer system. This article focuses on identifying the factors that leave less room for achieving the calculated coefficients of acceleration of parallel algorithms for operations on the graph models when using GPU CUDA, and in the development of recommendations, which, if to follow them, result in downtime.</p><p>This article considers a model program implementation in CUDA, defines the nature of the flow implementation, and offers 2 methods of planning the flows: the iterative execution of parallel function and iterative code execution in a parallel function. In order to develop effective data structures was carried out analysis of the features of memory in CUDA. We studied different memory allocation algorithms and developed CUDA-implemented critical section, which does not cause a deadlock of flows. Based on the results recommendations have been formulated for the effective use of the parallelization potential in CUDA.</p><p>Within the framework of practical test of the proposed methods and rules, has been developed a data structure to represent graphs, which is based on the CUDA principles, as well as have been offered several options to implement the parallel algorithm of the vertex decomposition operation: without using a critical section in the flows of iterative execution of a parallel function; without using a critical section in the flow of iterative execution of code within the parallel functions; using a critical section in the flows of iterative execution of code within the parallel function. Based on the analysis of experimental results it was concluded that the developed recommendations turned to be useful and using the CUDA when implementing operations on the graphs in large-scale problems was highly efficient.</p>http://technomag.edu.ru/jour/article/view/160operation on graphparallel algorithmCUDAoptimizationthreadmutexdata structure |
collection |
DOAJ |
language |
Russian |
format |
Article |
sources |
DOAJ |
author |
G. S. Ivanova A. A. Golovkov |
spellingShingle |
G. S. Ivanova A. A. Golovkov Optimization of Computations on Related Sets in CUDA Nauka i Obrazovanie operation on graph parallel algorithm CUDA optimization thread mutex data structure |
author_facet |
G. S. Ivanova A. A. Golovkov |
author_sort |
G. S. Ivanova |
title |
Optimization of Computations on Related Sets in CUDA |
title_short |
Optimization of Computations on Related Sets in CUDA |
title_full |
Optimization of Computations on Related Sets in CUDA |
title_fullStr |
Optimization of Computations on Related Sets in CUDA |
title_full_unstemmed |
Optimization of Computations on Related Sets in CUDA |
title_sort |
optimization of computations on related sets in cuda |
publisher |
MGTU im. N.È. Baumana |
series |
Nauka i Obrazovanie |
issn |
1994-0408 |
publishDate |
2015-01-01 |
description |
<p>Many algorithms for solving problems of analysis and synthesis of complex system structures have a large margin of internal parallelism. However, in their implementation in a specific parallel computing system an acceleration factor can be ultra low. First of all, such behaviour is because of hardware and software features of the computer system. This article focuses on identifying the factors that leave less room for achieving the calculated coefficients of acceleration of parallel algorithms for operations on the graph models when using GPU CUDA, and in the development of recommendations, which, if to follow them, result in downtime.</p><p>This article considers a model program implementation in CUDA, defines the nature of the flow implementation, and offers 2 methods of planning the flows: the iterative execution of parallel function and iterative code execution in a parallel function. In order to develop effective data structures was carried out analysis of the features of memory in CUDA. We studied different memory allocation algorithms and developed CUDA-implemented critical section, which does not cause a deadlock of flows. Based on the results recommendations have been formulated for the effective use of the parallelization potential in CUDA.</p><p>Within the framework of practical test of the proposed methods and rules, has been developed a data structure to represent graphs, which is based on the CUDA principles, as well as have been offered several options to implement the parallel algorithm of the vertex decomposition operation: without using a critical section in the flows of iterative execution of a parallel function; without using a critical section in the flow of iterative execution of code within the parallel functions; using a critical section in the flows of iterative execution of code within the parallel function. Based on the analysis of experimental results it was concluded that the developed recommendations turned to be useful and using the CUDA when implementing operations on the graphs in large-scale problems was highly efficient.</p> |
topic |
operation on graph parallel algorithm CUDA optimization thread mutex data structure |
url |
http://technomag.edu.ru/jour/article/view/160 |
work_keys_str_mv |
AT gsivanova optimizationofcomputationsonrelatedsetsincuda AT aagolovkov optimizationofcomputationsonrelatedsetsincuda |
_version_ |
1725750745190367232 |