Optimization of memory management on distributed machine

In order to explore further the capabilities of parallel computing architectures such as grids, clusters, multi-processors and more recently, clouds and multi-cores, an easy-to-use parallel language is an important challenging issue. From the programmer's point of view, OpenMP is very easy to u...

Full description

Bibliographic Details
Main Author: Ha, Viet Hai
Language:ENG
Published: Institut National des Télécommunications 2012
Subjects:
Online Access:http://tel.archives-ouvertes.fr/tel-00814630
http://tel.archives-ouvertes.fr/docs/00/81/46/30/PDF/These_HAVietHai.pdf
id ndltd-CCSD-oai-tel.archives-ouvertes.fr-tel-00814630
record_format oai_dc
spelling ndltd-CCSD-oai-tel.archives-ouvertes.fr-tel-008146302013-11-09T03:20:32Z http://tel.archives-ouvertes.fr/tel-00814630 2012TELE0042 http://tel.archives-ouvertes.fr/docs/00/81/46/30/PDF/These_HAVietHai.pdf Optimization of memory management on distributed machine Ha, Viet Hai [INFO:INFO_OH] Computer Science/Other [SHS:ECO] Humanities and Social Sciences/Economies and finances CAPE Chekpointing aided parallel execution Open MP compliance DICKPT Discontinuous incremental checkpointing UHLRC Updated home-based lazy relaxed consistency Distributed memory system Parallel computing In order to explore further the capabilities of parallel computing architectures such as grids, clusters, multi-processors and more recently, clouds and multi-cores, an easy-to-use parallel language is an important challenging issue. From the programmer's point of view, OpenMP is very easy to use with its ability to support incremental parallelization, features for dynamically setting the number of threads and scheduling strategies. However, as initially designed for shared memory systems, OpenMP is usually limited on distributed memory systems to intra-nodes' computations. Many attempts have tried to port OpenMP on distributed systems. The most emerged approaches mainly focus on exploiting the capabilities of a special network architecture and therefore cannot provide an open solution. Others are based on an already available software solution such as DMS, MPI or Global Array and, as a consequence, they meet difficulties to become a fully-compliant and high-performance implementation of OpenMP. As yet another attempt to built an OpenMP compliant implementation for distributed memory systems, CAPE − which stands for Checkpointing Aide Parallel Execution − has been developed which with the following idea: when reaching a parallel section, the master thread is dumped and its image is sent to slaves; then, each slave executes a different thread; at the end of the parallel section, slave threads extract and return to the master thread the list of all modifications that has been locally performed; the master includes these modifications and resumes its execution. In order to prove the feasibility of this paradigm, the first version of CAPE was implemented using complete checkpoints. However, preliminary analysis showed that the large amount of data transferred between threads and the extraction of the list of modifications from complete checkpoints lead to weak performance. Furthermore, this version was restricted to parallel problems satisfying the Bernstein's conditions, i.e. it did not solve the requirements of shared data. This thesis aims at presenting the approaches we proposed to improve CAPE' performance and to overcome the restrictions on shared data. First, we developed DICKPT which stands for Discontinuous Incremental Checkpointing, an incremental checkpointing technique that supports the ability to save incremental checkpoints discontinuously during the execution of a process. Based on the DICKPT, the execution speed of the new version of CAPE was significantly increased. For example, the time to compute a large matrix-matrix product on a desktop cluster has become very similar to the execution time of the same optimized MPI program. Moreover, the speedup associated with this new version for various number of threads is quite linear for different problem sizes. In the side of shared data, we proposed UHLRC, which stands for Updated Home-based Lazy Release Consistency, a modified version of the Home-based Lazy Release Consistency (HLRC) memory model, to make it more appropriate to the characteristics of CAPE. Prototypes and algorithms to implement the synchronization and OpenMP data-sharing clauses and directives are also specified. These two works ensures the ability for CAPE to respect shared-data behavior 2012-10-05 ENG PhD thesis Institut National des Télécommunications
collection NDLTD
language ENG
sources NDLTD
topic [INFO:INFO_OH] Computer Science/Other
[SHS:ECO] Humanities and Social Sciences/Economies and finances
CAPE
Chekpointing aided parallel execution
Open MP compliance
DICKPT
Discontinuous incremental checkpointing
UHLRC
Updated home-based lazy relaxed consistency
Distributed memory system
Parallel computing
spellingShingle [INFO:INFO_OH] Computer Science/Other
[SHS:ECO] Humanities and Social Sciences/Economies and finances
CAPE
Chekpointing aided parallel execution
Open MP compliance
DICKPT
Discontinuous incremental checkpointing
UHLRC
Updated home-based lazy relaxed consistency
Distributed memory system
Parallel computing
Ha, Viet Hai
Optimization of memory management on distributed machine
description In order to explore further the capabilities of parallel computing architectures such as grids, clusters, multi-processors and more recently, clouds and multi-cores, an easy-to-use parallel language is an important challenging issue. From the programmer's point of view, OpenMP is very easy to use with its ability to support incremental parallelization, features for dynamically setting the number of threads and scheduling strategies. However, as initially designed for shared memory systems, OpenMP is usually limited on distributed memory systems to intra-nodes' computations. Many attempts have tried to port OpenMP on distributed systems. The most emerged approaches mainly focus on exploiting the capabilities of a special network architecture and therefore cannot provide an open solution. Others are based on an already available software solution such as DMS, MPI or Global Array and, as a consequence, they meet difficulties to become a fully-compliant and high-performance implementation of OpenMP. As yet another attempt to built an OpenMP compliant implementation for distributed memory systems, CAPE − which stands for Checkpointing Aide Parallel Execution − has been developed which with the following idea: when reaching a parallel section, the master thread is dumped and its image is sent to slaves; then, each slave executes a different thread; at the end of the parallel section, slave threads extract and return to the master thread the list of all modifications that has been locally performed; the master includes these modifications and resumes its execution. In order to prove the feasibility of this paradigm, the first version of CAPE was implemented using complete checkpoints. However, preliminary analysis showed that the large amount of data transferred between threads and the extraction of the list of modifications from complete checkpoints lead to weak performance. Furthermore, this version was restricted to parallel problems satisfying the Bernstein's conditions, i.e. it did not solve the requirements of shared data. This thesis aims at presenting the approaches we proposed to improve CAPE' performance and to overcome the restrictions on shared data. First, we developed DICKPT which stands for Discontinuous Incremental Checkpointing, an incremental checkpointing technique that supports the ability to save incremental checkpoints discontinuously during the execution of a process. Based on the DICKPT, the execution speed of the new version of CAPE was significantly increased. For example, the time to compute a large matrix-matrix product on a desktop cluster has become very similar to the execution time of the same optimized MPI program. Moreover, the speedup associated with this new version for various number of threads is quite linear for different problem sizes. In the side of shared data, we proposed UHLRC, which stands for Updated Home-based Lazy Release Consistency, a modified version of the Home-based Lazy Release Consistency (HLRC) memory model, to make it more appropriate to the characteristics of CAPE. Prototypes and algorithms to implement the synchronization and OpenMP data-sharing clauses and directives are also specified. These two works ensures the ability for CAPE to respect shared-data behavior
author Ha, Viet Hai
author_facet Ha, Viet Hai
author_sort Ha, Viet Hai
title Optimization of memory management on distributed machine
title_short Optimization of memory management on distributed machine
title_full Optimization of memory management on distributed machine
title_fullStr Optimization of memory management on distributed machine
title_full_unstemmed Optimization of memory management on distributed machine
title_sort optimization of memory management on distributed machine
publisher Institut National des Télécommunications
publishDate 2012
url http://tel.archives-ouvertes.fr/tel-00814630
http://tel.archives-ouvertes.fr/docs/00/81/46/30/PDF/These_HAVietHai.pdf
work_keys_str_mv AT haviethai optimizationofmemorymanagementondistributedmachine
_version_ 1716613504112263168