Co-relation between OpenMP CutOff level and Cache performance

碩士 === 輔仁大學 === 資訊工程學系 === 101 === From the time multiple core emerged from single core processor, OpenMP Application Program Interface(API) became a standard to easily parallelize the existing softwares to multi-core processors. Several specifications were introduced step by step that provide a mod...

Full description

Bibliographic Details
Main Authors: Wang, MingHsien, 王銘幰
Other Authors: Joseph M. Arul
Format: Others
Language:zh-TW
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/13171790028260235055
Description
Summary:碩士 === 輔仁大學 === 資訊工程學系 === 101 === From the time multiple core emerged from single core processor, OpenMP Application Program Interface(API) became a standard to easily parallelize the existing softwares to multi-core processors. Several specifications were introduced step by step that provide a model for parallel programming that is portable across shared memory architectures from different vendors. Recently, in the version 3.0, the task level parallelism was introduced mainly to distribute the execution processes across different parallel computing nodes. While creating several tasks, the granularity can increase. Especially in the recursive programs, through cutoff level it would be easy to manage both the granularity and the number of tasks, which will have a great impact on the application performance. This research mainly focuses on the impact of performance and the co-relation between cutoff level and the cache performance. By increasing the cutoff level can the cache miss remain less? If the cache miss increases, as we increase the cutoff level, the performance might degrade. Under these circumstances, increasing the cutoff level may not be very useful. Can we increase the cutoff level more than the number of available cores? If so, how many cutoff level will be worth using, given any number of cores. For this purpose we have used Gleipnir tool for memory profiling built on the Valgrind binary instrumentation framework. The Valgrind which is a tool for instrumentation frame work for building dynamic analysis tools. This tool automatically detects many memory management and threading profiler for programs. The memory traces collected from the Gleipnir is used with the DineroIV to accurately measure the cache misses of programs. Thus, the evaluation shows in many benchmarks that the cutoff level exceeds the number of cores, the cache misses increase drastically. As a result, the performance does not improve to the great extent. Several recursive programs as well as few benchmarks from Barcelona benchmark tools (BOTS) for task level parallelism were chosen. After running these benchmarks several times, accurate cache misses and performances very observed. Several memory related benchmarks show that the cache misses increase when the cutoff level increases to the number of cores. The co-relation between cache miss and the cutoff level is obvious. Many benchmarks show that, it is best to keep the cut-off level as much as the number of cores. In the future, it would be interesting to study when we increase the number of cores in relation to cutoff level and task level.