Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E.

The IBM Cell Broadband Engine (BE) is a novel multi-core chip with the potential for the demanding floating point performance that is required for high-fidelity scientific simulations. However, data movement within the chip can be a major challenge to realizing the benefits of the peak floating poin...

Full description

Bibliographic Details
Main Authors: Olaf Lubeck, Michael Lang, Ram Srinivasan, Greg Johnson
Format: Article
Language:English
Published: Hindawi Limited 2009-01-01
Series:Scientific Programming
Online Access:http://dx.doi.org/10.3233/SPR-2009-0266
id doaj-1b799f3e008947d3b747f0aaac93337d
record_format Article
spelling doaj-1b799f3e008947d3b747f0aaac93337d2021-07-02T05:48:19ZengHindawi LimitedScientific Programming1058-92441875-919X2009-01-01171-219920810.3233/SPR-2009-0266Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E.Olaf Lubeck0Michael Lang1Ram Srinivasan2Greg Johnson3Los Alamos National Laboratory, Los Alamos, NM, USALos Alamos National Laboratory, Los Alamos, NM, USAIntel Fort Collins, CO, USAGoogle Mountain View, CA, USAThe IBM Cell Broadband Engine (BE) is a novel multi-core chip with the potential for the demanding floating point performance that is required for high-fidelity scientific simulations. However, data movement within the chip can be a major challenge to realizing the benefits of the peak floating point rates. In this paper, we present the results of implementing Sweep3D on the Cell/B.E. using an intra-chip message passing model that minimizes data movement. We compare the advantages/disadvantages of this programming model with a previous implementation using a master–worker threading strategy. We apply a previously validated micro-architecture performance model for the application executing on the Cell/B.E. (based on our previous work in Monte Carlo performance models), that predicts overall CPI (cycles per instruction), and gives a detailed breakdown of processor stalls. Finally, we use the micro-architecture model to assess the performance of future design parameters for the Cell/B.E. micro-architecture. The methodologies and results have broader implications that extend to multi-core architectures.http://dx.doi.org/10.3233/SPR-2009-0266
collection DOAJ
language English
format Article
sources DOAJ
author Olaf Lubeck
Michael Lang
Ram Srinivasan
Greg Johnson
spellingShingle Olaf Lubeck
Michael Lang
Ram Srinivasan
Greg Johnson
Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E.
Scientific Programming
author_facet Olaf Lubeck
Michael Lang
Ram Srinivasan
Greg Johnson
author_sort Olaf Lubeck
title Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E.
title_short Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E.
title_full Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E.
title_fullStr Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E.
title_full_unstemmed Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E.
title_sort implementation and performance modeling of deterministic particle transport (sweep3d) on the ibm cell/b.e.
publisher Hindawi Limited
series Scientific Programming
issn 1058-9244
1875-919X
publishDate 2009-01-01
description The IBM Cell Broadband Engine (BE) is a novel multi-core chip with the potential for the demanding floating point performance that is required for high-fidelity scientific simulations. However, data movement within the chip can be a major challenge to realizing the benefits of the peak floating point rates. In this paper, we present the results of implementing Sweep3D on the Cell/B.E. using an intra-chip message passing model that minimizes data movement. We compare the advantages/disadvantages of this programming model with a previous implementation using a master–worker threading strategy. We apply a previously validated micro-architecture performance model for the application executing on the Cell/B.E. (based on our previous work in Monte Carlo performance models), that predicts overall CPI (cycles per instruction), and gives a detailed breakdown of processor stalls. Finally, we use the micro-architecture model to assess the performance of future design parameters for the Cell/B.E. micro-architecture. The methodologies and results have broader implications that extend to multi-core architectures.
url http://dx.doi.org/10.3233/SPR-2009-0266
work_keys_str_mv AT olaflubeck implementationandperformancemodelingofdeterministicparticletransportsweep3dontheibmcellbe
AT michaellang implementationandperformancemodelingofdeterministicparticletransportsweep3dontheibmcellbe
AT ramsrinivasan implementationandperformancemodelingofdeterministicparticletransportsweep3dontheibmcellbe
AT gregjohnson implementationandperformancemodelingofdeterministicparticletransportsweep3dontheibmcellbe
_version_ 1721338138699235328