Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E.
The IBM Cell Broadband Engine (BE) is a novel multi-core chip with the potential for the demanding floating point performance that is required for high-fidelity scientific simulations. However, data movement within the chip can be a major challenge to realizing the benefits of the peak floating poin...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2009-01-01
|
Series: | Scientific Programming |
Online Access: | http://dx.doi.org/10.3233/SPR-2009-0266 |
id |
doaj-1b799f3e008947d3b747f0aaac93337d |
---|---|
record_format |
Article |
spelling |
doaj-1b799f3e008947d3b747f0aaac93337d2021-07-02T05:48:19ZengHindawi LimitedScientific Programming1058-92441875-919X2009-01-01171-219920810.3233/SPR-2009-0266Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E.Olaf Lubeck0Michael Lang1Ram Srinivasan2Greg Johnson3Los Alamos National Laboratory, Los Alamos, NM, USALos Alamos National Laboratory, Los Alamos, NM, USAIntel Fort Collins, CO, USAGoogle Mountain View, CA, USAThe IBM Cell Broadband Engine (BE) is a novel multi-core chip with the potential for the demanding floating point performance that is required for high-fidelity scientific simulations. However, data movement within the chip can be a major challenge to realizing the benefits of the peak floating point rates. In this paper, we present the results of implementing Sweep3D on the Cell/B.E. using an intra-chip message passing model that minimizes data movement. We compare the advantages/disadvantages of this programming model with a previous implementation using a master–worker threading strategy. We apply a previously validated micro-architecture performance model for the application executing on the Cell/B.E. (based on our previous work in Monte Carlo performance models), that predicts overall CPI (cycles per instruction), and gives a detailed breakdown of processor stalls. Finally, we use the micro-architecture model to assess the performance of future design parameters for the Cell/B.E. micro-architecture. The methodologies and results have broader implications that extend to multi-core architectures.http://dx.doi.org/10.3233/SPR-2009-0266 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Olaf Lubeck Michael Lang Ram Srinivasan Greg Johnson |
spellingShingle |
Olaf Lubeck Michael Lang Ram Srinivasan Greg Johnson Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E. Scientific Programming |
author_facet |
Olaf Lubeck Michael Lang Ram Srinivasan Greg Johnson |
author_sort |
Olaf Lubeck |
title |
Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E. |
title_short |
Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E. |
title_full |
Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E. |
title_fullStr |
Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E. |
title_full_unstemmed |
Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E. |
title_sort |
implementation and performance modeling of deterministic particle transport (sweep3d) on the ibm cell/b.e. |
publisher |
Hindawi Limited |
series |
Scientific Programming |
issn |
1058-9244 1875-919X |
publishDate |
2009-01-01 |
description |
The IBM Cell Broadband Engine (BE) is a novel multi-core chip with the potential for the demanding floating point performance that is required for high-fidelity scientific simulations. However, data movement within the chip can be a major challenge to realizing the benefits of the peak floating point rates. In this paper, we present the results of implementing Sweep3D on the Cell/B.E. using an intra-chip message passing model that minimizes data movement. We compare the advantages/disadvantages of this programming model with a previous implementation using a master–worker threading strategy. We apply a previously validated micro-architecture performance model for the application executing on the Cell/B.E. (based on our previous work in Monte Carlo performance models), that predicts overall CPI (cycles per instruction), and gives a detailed breakdown of processor stalls. Finally, we use the micro-architecture model to assess the performance of future design parameters for the Cell/B.E. micro-architecture. The methodologies and results have broader implications that extend to multi-core architectures. |
url |
http://dx.doi.org/10.3233/SPR-2009-0266 |
work_keys_str_mv |
AT olaflubeck implementationandperformancemodelingofdeterministicparticletransportsweep3dontheibmcellbe AT michaellang implementationandperformancemodelingofdeterministicparticletransportsweep3dontheibmcellbe AT ramsrinivasan implementationandperformancemodelingofdeterministicparticletransportsweep3dontheibmcellbe AT gregjohnson implementationandperformancemodelingofdeterministicparticletransportsweep3dontheibmcellbe |
_version_ |
1721338138699235328 |