Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E.

The IBM Cell Broadband Engine (BE) is a novel multi-core chip with the potential for the demanding floating point performance that is required for high-fidelity scientific simulations. However, data movement within the chip can be a major challenge to realizing the benefits of the peak floating poin...

Full description

Bibliographic Details
Main Authors:	Olaf Lubeck, Michael Lang, Ram Srinivasan, Greg Johnson
Format:	Article
Language:	English
Published:	Hindawi Limited 2009-01-01
Series:	Scientific Programming
Online Access:	http://dx.doi.org/10.3233/SPR-2009-0266

id	doaj-1b799f3e008947d3b747f0aaac93337d
record_format	Article
spelling	doaj-1b799f3e008947d3b747f0aaac93337d2021-07-02T05:48:19ZengHindawi LimitedScientific Programming1058-92441875-919X2009-01-01171-219920810.3233/SPR-2009-0266Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E.Olaf Lubeck0Michael Lang1Ram Srinivasan2Greg Johnson3Los Alamos National Laboratory, Los Alamos, NM, USALos Alamos National Laboratory, Los Alamos, NM, USAIntel Fort Collins, CO, USAGoogle Mountain View, CA, USAThe IBM Cell Broadband Engine (BE) is a novel multi-core chip with the potential for the demanding floating point performance that is required for high-fidelity scientific simulations. However, data movement within the chip can be a major challenge to realizing the benefits of the peak floating point rates. In this paper, we present the results of implementing Sweep3D on the Cell/B.E. using an intra-chip message passing model that minimizes data movement. We compare the advantages/disadvantages of this programming model with a previous implementation using a master–worker threading strategy. We apply a previously validated micro-architecture performance model for the application executing on the Cell/B.E. (based on our previous work in Monte Carlo performance models), that predicts overall CPI (cycles per instruction), and gives a detailed breakdown of processor stalls. Finally, we use the micro-architecture model to assess the performance of future design parameters for the Cell/B.E. micro-architecture. The methodologies and results have broader implications that extend to multi-core architectures.http://dx.doi.org/10.3233/SPR-2009-0266
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Olaf Lubeck Michael Lang Ram Srinivasan Greg Johnson
spellingShingle	Olaf Lubeck Michael Lang Ram Srinivasan Greg Johnson Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E. Scientific Programming
author_facet	Olaf Lubeck Michael Lang Ram Srinivasan Greg Johnson
author_sort	Olaf Lubeck
title	Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E.
title_short	Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E.
title_full	Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E.
title_fullStr	Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E.
title_full_unstemmed	Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E.
title_sort	implementation and performance modeling of deterministic particle transport (sweep3d) on the ibm cell/b.e.
publisher	Hindawi Limited
series	Scientific Programming
issn	1058-9244 1875-919X
publishDate	2009-01-01
description	The IBM Cell Broadband Engine (BE) is a novel multi-core chip with the potential for the demanding floating point performance that is required for high-fidelity scientific simulations. However, data movement within the chip can be a major challenge to realizing the benefits of the peak floating point rates. In this paper, we present the results of implementing Sweep3D on the Cell/B.E. using an intra-chip message passing model that minimizes data movement. We compare the advantages/disadvantages of this programming model with a previous implementation using a master–worker threading strategy. We apply a previously validated micro-architecture performance model for the application executing on the Cell/B.E. (based on our previous work in Monte Carlo performance models), that predicts overall CPI (cycles per instruction), and gives a detailed breakdown of processor stalls. Finally, we use the micro-architecture model to assess the performance of future design parameters for the Cell/B.E. micro-architecture. The methodologies and results have broader implications that extend to multi-core architectures.
url	http://dx.doi.org/10.3233/SPR-2009-0266
work_keys_str_mv	AT olaflubeck implementationandperformancemodelingofdeterministicparticletransportsweep3dontheibmcellbe AT michaellang implementationandperformancemodelingofdeterministicparticletransportsweep3dontheibmcellbe AT ramsrinivasan implementationandperformancemodelingofdeterministicparticletransportsweep3dontheibmcellbe AT gregjohnson implementationandperformancemodelingofdeterministicparticletransportsweep3dontheibmcellbe
_version_	1721338138699235328

Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E.

Similar Items