Optimization of memory management on distributed machine

In order to explore further the capabilities of parallel computing architectures such as grids, clusters, multi-processors and more recently, clouds and multi-cores, an easy-to-use parallel language is an important challenging issue. From the programmer's point of view, OpenMP is very easy to u...

Full description

Bibliographic Details
Main Author:	Ha, Viet Hai
Language:	ENG
Published:	Institut National des Télécommunications 2012
Subjects:	[INFO:INFO_OH] Computer Science/Other [SHS:ECO] Humanities and Social Sciences/Economies and finances CAPE Chekpointing aided parallel execution Open MP compliance DICKPT Discontinuous incremental checkpointing UHLRC Updated home-based lazy relaxed consistency Distributed memory system Parallel computing
Online Access:	http://tel.archives-ouvertes.fr/tel-00814630 http://tel.archives-ouvertes.fr/docs/00/81/46/30/PDF/These_HAVietHai.pdf

id	ndltd-CCSD-oai-tel.archives-ouvertes.fr-tel-00814630
record_format	oai_dc
spelling	ndltd-CCSD-oai-tel.archives-ouvertes.fr-tel-008146302013-11-09T03:20:32Z http://tel.archives-ouvertes.fr/tel-00814630 2012TELE0042 http://tel.archives-ouvertes.fr/docs/00/81/46/30/PDF/These_HAVietHai.pdf Optimization of memory management on distributed machine Ha, Viet Hai [INFO:INFO_OH] Computer Science/Other [SHS:ECO] Humanities and Social Sciences/Economies and finances CAPE Chekpointing aided parallel execution Open MP compliance DICKPT Discontinuous incremental checkpointing UHLRC Updated home-based lazy relaxed consistency Distributed memory system Parallel computing In order to explore further the capabilities of parallel computing architectures such as grids, clusters, multi-processors and more recently, clouds and multi-cores, an easy-to-use parallel language is an important challenging issue. From the programmer's point of view, OpenMP is very easy to use with its ability to support incremental parallelization, features for dynamically setting the number of threads and scheduling strategies. However, as initially designed for shared memory systems, OpenMP is usually limited on distributed memory systems to intra-nodes' computations. Many attempts have tried to port OpenMP on distributed systems. The most emerged approaches mainly focus on exploiting the capabilities of a special network architecture and therefore cannot provide an open solution. Others are based on an already available software solution such as DMS, MPI or Global Array and, as a consequence, they meet difficulties to become a fully-compliant and high-performance implementation of OpenMP. As yet another attempt to built an OpenMP compliant implementation for distributed memory systems, CAPE − which stands for Checkpointing Aide Parallel Execution − has been developed which with the following idea: when reaching a parallel section, the master thread is dumped and its image is sent to slaves; then, each slave executes a different thread; at the end of the parallel section, slave threads extract and return to the master thread the list of all modifications that has been locally performed; the master includes these modifications and resumes its execution. In order to prove the feasibility of this paradigm, the first version of CAPE was implemented using complete checkpoints. However, preliminary analysis showed that the large amount of data transferred between threads and the extraction of the list of modifications from complete checkpoints lead to weak performance. Furthermore, this version was restricted to parallel problems satisfying the Bernstein's conditions, i.e. it did not solve the requirements of shared data. This thesis aims at presenting the approaches we proposed to improve CAPE' performance and to overcome the restrictions on shared data. First, we developed DICKPT which stands for Discontinuous Incremental Checkpointing, an incremental checkpointing technique that supports the ability to save incremental checkpoints discontinuously during the execution of a process. Based on the DICKPT, the execution speed of the new version of CAPE was significantly increased. For example, the time to compute a large matrix-matrix product on a desktop cluster has become very similar to the execution time of the same optimized MPI program. Moreover, the speedup associated with this new version for various number of threads is quite linear for different problem sizes. In the side of shared data, we proposed UHLRC, which stands for Updated Home-based Lazy Release Consistency, a modified version of the Home-based Lazy Release Consistency (HLRC) memory model, to make it more appropriate to the characteristics of CAPE. Prototypes and algorithms to implement the synchronization and OpenMP data-sharing clauses and directives are also specified. These two works ensures the ability for CAPE to respect shared-data behavior 2012-10-05 ENG PhD thesis Institut National des Télécommunications
collection	NDLTD
language	ENG
sources	NDLTD
topic	[INFO:INFO_OH] Computer Science/Other [SHS:ECO] Humanities and Social Sciences/Economies and finances CAPE Chekpointing aided parallel execution Open MP compliance DICKPT Discontinuous incremental checkpointing UHLRC Updated home-based lazy relaxed consistency Distributed memory system Parallel computing
spellingShingle	[INFO:INFO_OH] Computer Science/Other [SHS:ECO] Humanities and Social Sciences/Economies and finances CAPE Chekpointing aided parallel execution Open MP compliance DICKPT Discontinuous incremental checkpointing UHLRC Updated home-based lazy relaxed consistency Distributed memory system Parallel computing Ha, Viet Hai Optimization of memory management on distributed machine
description	In order to explore further the capabilities of parallel computing architectures such as grids, clusters, multi-processors and more recently, clouds and multi-cores, an easy-to-use parallel language is an important challenging issue. From the programmer's point of view, OpenMP is very easy to use with its ability to support incremental parallelization, features for dynamically setting the number of threads and scheduling strategies. However, as initially designed for shared memory systems, OpenMP is usually limited on distributed memory systems to intra-nodes' computations. Many attempts have tried to port OpenMP on distributed systems. The most emerged approaches mainly focus on exploiting the capabilities of a special network architecture and therefore cannot provide an open solution. Others are based on an already available software solution such as DMS, MPI or Global Array and, as a consequence, they meet difficulties to become a fully-compliant and high-performance implementation of OpenMP. As yet another attempt to built an OpenMP compliant implementation for distributed memory systems, CAPE − which stands for Checkpointing Aide Parallel Execution − has been developed which with the following idea: when reaching a parallel section, the master thread is dumped and its image is sent to slaves; then, each slave executes a different thread; at the end of the parallel section, slave threads extract and return to the master thread the list of all modifications that has been locally performed; the master includes these modifications and resumes its execution. In order to prove the feasibility of this paradigm, the first version of CAPE was implemented using complete checkpoints. However, preliminary analysis showed that the large amount of data transferred between threads and the extraction of the list of modifications from complete checkpoints lead to weak performance. Furthermore, this version was restricted to parallel problems satisfying the Bernstein's conditions, i.e. it did not solve the requirements of shared data. This thesis aims at presenting the approaches we proposed to improve CAPE' performance and to overcome the restrictions on shared data. First, we developed DICKPT which stands for Discontinuous Incremental Checkpointing, an incremental checkpointing technique that supports the ability to save incremental checkpoints discontinuously during the execution of a process. Based on the DICKPT, the execution speed of the new version of CAPE was significantly increased. For example, the time to compute a large matrix-matrix product on a desktop cluster has become very similar to the execution time of the same optimized MPI program. Moreover, the speedup associated with this new version for various number of threads is quite linear for different problem sizes. In the side of shared data, we proposed UHLRC, which stands for Updated Home-based Lazy Release Consistency, a modified version of the Home-based Lazy Release Consistency (HLRC) memory model, to make it more appropriate to the characteristics of CAPE. Prototypes and algorithms to implement the synchronization and OpenMP data-sharing clauses and directives are also specified. These two works ensures the ability for CAPE to respect shared-data behavior
author	Ha, Viet Hai
author_facet	Ha, Viet Hai
author_sort	Ha, Viet Hai
title	Optimization of memory management on distributed machine
title_short	Optimization of memory management on distributed machine
title_full	Optimization of memory management on distributed machine
title_fullStr	Optimization of memory management on distributed machine
title_full_unstemmed	Optimization of memory management on distributed machine
title_sort	optimization of memory management on distributed machine
publisher	Institut National des Télécommunications
publishDate	2012
url	http://tel.archives-ouvertes.fr/tel-00814630 http://tel.archives-ouvertes.fr/docs/00/81/46/30/PDF/These_HAVietHai.pdf
work_keys_str_mv	AT haviethai optimizationofmemorymanagementondistributedmachine
_version_	1716613504112263168

Optimization of memory management on distributed machine

Similar Items