Optimization of memory management on distributed machine
In order to explore further the capabilities of parallel computing architectures such as grids, clusters, multi-processors and more recently, clouds and multi-cores, an easy-to-use parallel language is an important challenging issue. From the programmer's point of view, OpenMP is very easy to u...
Main Author: | |
---|---|
Language: | ENG |
Published: |
Institut National des Télécommunications
2012
|
Subjects: | |
Online Access: | http://tel.archives-ouvertes.fr/tel-00814630 http://tel.archives-ouvertes.fr/docs/00/81/46/30/PDF/These_HAVietHai.pdf |
id |
ndltd-CCSD-oai-tel.archives-ouvertes.fr-tel-00814630 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-CCSD-oai-tel.archives-ouvertes.fr-tel-008146302013-11-09T03:20:32Z http://tel.archives-ouvertes.fr/tel-00814630 2012TELE0042 http://tel.archives-ouvertes.fr/docs/00/81/46/30/PDF/These_HAVietHai.pdf Optimization of memory management on distributed machine Ha, Viet Hai [INFO:INFO_OH] Computer Science/Other [SHS:ECO] Humanities and Social Sciences/Economies and finances CAPE Chekpointing aided parallel execution Open MP compliance DICKPT Discontinuous incremental checkpointing UHLRC Updated home-based lazy relaxed consistency Distributed memory system Parallel computing In order to explore further the capabilities of parallel computing architectures such as grids, clusters, multi-processors and more recently, clouds and multi-cores, an easy-to-use parallel language is an important challenging issue. From the programmer's point of view, OpenMP is very easy to use with its ability to support incremental parallelization, features for dynamically setting the number of threads and scheduling strategies. However, as initially designed for shared memory systems, OpenMP is usually limited on distributed memory systems to intra-nodes' computations. Many attempts have tried to port OpenMP on distributed systems. The most emerged approaches mainly focus on exploiting the capabilities of a special network architecture and therefore cannot provide an open solution. Others are based on an already available software solution such as DMS, MPI or Global Array and, as a consequence, they meet difficulties to become a fully-compliant and high-performance implementation of OpenMP. As yet another attempt to built an OpenMP compliant implementation for distributed memory systems, CAPE − which stands for Checkpointing Aide Parallel Execution − has been developed which with the following idea: when reaching a parallel section, the master thread is dumped and its image is sent to slaves; then, each slave executes a different thread; at the end of the parallel section, slave threads extract and return to the master thread the list of all modifications that has been locally performed; the master includes these modifications and resumes its execution. In order to prove the feasibility of this paradigm, the first version of CAPE was implemented using complete checkpoints. However, preliminary analysis showed that the large amount of data transferred between threads and the extraction of the list of modifications from complete checkpoints lead to weak performance. Furthermore, this version was restricted to parallel problems satisfying the Bernstein's conditions, i.e. it did not solve the requirements of shared data. This thesis aims at presenting the approaches we proposed to improve CAPE' performance and to overcome the restrictions on shared data. First, we developed DICKPT which stands for Discontinuous Incremental Checkpointing, an incremental checkpointing technique that supports the ability to save incremental checkpoints discontinuously during the execution of a process. Based on the DICKPT, the execution speed of the new version of CAPE was significantly increased. For example, the time to compute a large matrix-matrix product on a desktop cluster has become very similar to the execution time of the same optimized MPI program. Moreover, the speedup associated with this new version for various number of threads is quite linear for different problem sizes. In the side of shared data, we proposed UHLRC, which stands for Updated Home-based Lazy Release Consistency, a modified version of the Home-based Lazy Release Consistency (HLRC) memory model, to make it more appropriate to the characteristics of CAPE. Prototypes and algorithms to implement the synchronization and OpenMP data-sharing clauses and directives are also specified. These two works ensures the ability for CAPE to respect shared-data behavior 2012-10-05 ENG PhD thesis Institut National des Télécommunications |
collection |
NDLTD |
language |
ENG |
sources |
NDLTD |
topic |
[INFO:INFO_OH] Computer Science/Other [SHS:ECO] Humanities and Social Sciences/Economies and finances CAPE Chekpointing aided parallel execution Open MP compliance DICKPT Discontinuous incremental checkpointing UHLRC Updated home-based lazy relaxed consistency Distributed memory system Parallel computing |
spellingShingle |
[INFO:INFO_OH] Computer Science/Other [SHS:ECO] Humanities and Social Sciences/Economies and finances CAPE Chekpointing aided parallel execution Open MP compliance DICKPT Discontinuous incremental checkpointing UHLRC Updated home-based lazy relaxed consistency Distributed memory system Parallel computing Ha, Viet Hai Optimization of memory management on distributed machine |
description |
In order to explore further the capabilities of parallel computing architectures such as grids, clusters, multi-processors and more recently, clouds and multi-cores, an easy-to-use parallel language is an important challenging issue. From the programmer's point of view, OpenMP is very easy to use with its ability to support incremental parallelization, features for dynamically setting the number of threads and scheduling strategies. However, as initially designed for shared memory systems, OpenMP is usually limited on distributed memory systems to intra-nodes' computations. Many attempts have tried to port OpenMP on distributed systems. The most emerged approaches mainly focus on exploiting the capabilities of a special network architecture and therefore cannot provide an open solution. Others are based on an already available software solution such as DMS, MPI or Global Array and, as a consequence, they meet difficulties to become a fully-compliant and high-performance implementation of OpenMP. As yet another attempt to built an OpenMP compliant implementation for distributed memory systems, CAPE − which stands for Checkpointing Aide Parallel Execution − has been developed which with the following idea: when reaching a parallel section, the master thread is dumped and its image is sent to slaves; then, each slave executes a different thread; at the end of the parallel section, slave threads extract and return to the master thread the list of all modifications that has been locally performed; the master includes these modifications and resumes its execution. In order to prove the feasibility of this paradigm, the first version of CAPE was implemented using complete checkpoints. However, preliminary analysis showed that the large amount of data transferred between threads and the extraction of the list of modifications from complete checkpoints lead to weak performance. Furthermore, this version was restricted to parallel problems satisfying the Bernstein's conditions, i.e. it did not solve the requirements of shared data. This thesis aims at presenting the approaches we proposed to improve CAPE' performance and to overcome the restrictions on shared data. First, we developed DICKPT which stands for Discontinuous Incremental Checkpointing, an incremental checkpointing technique that supports the ability to save incremental checkpoints discontinuously during the execution of a process. Based on the DICKPT, the execution speed of the new version of CAPE was significantly increased. For example, the time to compute a large matrix-matrix product on a desktop cluster has become very similar to the execution time of the same optimized MPI program. Moreover, the speedup associated with this new version for various number of threads is quite linear for different problem sizes. In the side of shared data, we proposed UHLRC, which stands for Updated Home-based Lazy Release Consistency, a modified version of the Home-based Lazy Release Consistency (HLRC) memory model, to make it more appropriate to the characteristics of CAPE. Prototypes and algorithms to implement the synchronization and OpenMP data-sharing clauses and directives are also specified. These two works ensures the ability for CAPE to respect shared-data behavior |
author |
Ha, Viet Hai |
author_facet |
Ha, Viet Hai |
author_sort |
Ha, Viet Hai |
title |
Optimization of memory management on distributed machine |
title_short |
Optimization of memory management on distributed machine |
title_full |
Optimization of memory management on distributed machine |
title_fullStr |
Optimization of memory management on distributed machine |
title_full_unstemmed |
Optimization of memory management on distributed machine |
title_sort |
optimization of memory management on distributed machine |
publisher |
Institut National des Télécommunications |
publishDate |
2012 |
url |
http://tel.archives-ouvertes.fr/tel-00814630 http://tel.archives-ouvertes.fr/docs/00/81/46/30/PDF/These_HAVietHai.pdf |
work_keys_str_mv |
AT haviethai optimizationofmemorymanagementondistributedmachine |
_version_ |
1716613504112263168 |