Vforce: VSIPL++ for reconfigurable computing environments.

Systems with heterogeneous processing elements, such as commodity software processors combined with special purpose processors like FPGAs or GPUs, offer enormous potential speedups for certain types of workloads. There are, however, significant program development challenges on these systems. Progra...

Full description

Bibliographic Details
Published:
Online Access:http://hdl.handle.net/2047/d20001239
Description
Summary:Systems with heterogeneous processing elements, such as commodity software processors combined with special purpose processors like FPGAs or GPUs, offer enormous potential speedups for certain types of workloads. There are, however, significant program development challenges on these systems. Programs written for these systems tend to have a lot of platform specific code integrated into the rest of the application code, making portability difficult. In addition, these systems have different programming models and tools requiring the developer to have hardware specific knowledge in addition to application domain expertise. Compounding these two problems is the short lifetime for these systems. A mechanism for portability across multiple architectures and generations is desirable. This thesis presents Vforce, an extensible framework that extends the VSIPL++ standard to add portable and transparent support for special purpose processors. New library elements that include portable special purpose processor support can be added to VSIPL++ through the use of Vforce's generic hardware interface -- the user application code and binary contain nothing specific to the special purpose processors. The decision about which, if any, special purpose processor to use to execute the new library element is made at runtime by a hardware resource manager that runs on the system independent of the user application. This manager also provides the information necessary to bind Vforce's generic hardware interface to the specific API used by the selected special purpose processor. The implementation of Vforce and two specific usage examples, an FFT and an adaptive time-domain beamformer, are discussed. Results for the two examples on a Cray XD1 heterogeneous supercomputer, as well as an analysis of the overhead added by Vforce, are presented. The results demonstrate the portability and performance achievable with the Vforce framework.