Shared instruction-set extensions for soft multiprocessor systems implemented on field-programmable gate arrays
Soft-core embedded systems implemented on FPGAs offer a high level of flexibility. Application specific customizations can be added in the form of extensions to the processor’s regular instruction-set. These custom instructions benefit run-time performance, but come at the cost of increased resource...
Main Author: | |
---|---|
Language: | English |
Published: |
University of British Columbia
2012
|
Online Access: | http://hdl.handle.net/2429/43695 |
id |
ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.2429-43695 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.2429-436952014-03-26T03:39:11Z Shared instruction-set extensions for soft multiprocessor systems implemented on field-programmable gate arrays Johnston, Erin Soft-core embedded systems implemented on FPGAs offer a high level of flexibility. Application specific customizations can be added in the form of extensions to the processor’s regular instruction-set. These custom instructions benefit run-time performance, but come at the cost of increased resource usage. Reducing the overall FPGA area required to implement a system will decrease static power consumption and allow a smaller, cheaper device to be used. There is a constant effort to reduce area and power consumption while maintaining performance benefits attained through customizations. This thesis presents a new architecture to share custom instruction units among multiple processors in a system. This implementation allows run-time performance benefits to be maintained while decreasing the overall resource usage. The shared architecture is implemented using an arbitrator to determine processor access to each custom instruction in a set. Custom instruction inputs and outputs are controlled using additional multiplexors and selection hardware. Results for a sample system using fine-grained custom instructions show that sharing can reduce the implementation area by up to 24% with minimal impact to the critical path delay. This reduction remains high at 19% for a coarse-grained case study of an encryption algorithm called SHA. The custom instruction configuration depends on the application being performed. A benchmark generator and simulator are also developed to evaluate candidates for custom instruction implementation and efficiently explore the design space. The overall run-time performance of the candidate systems can also be evaluated using these tools. The simulator can also be used with an input trace to determine cycle accurate run-time performance for a real application, without requiring the entire system to be designed and implemented in hardware. The simulator shows up to 53% run-time improvement for a shared fine-grained system over a system with no custom instructions. Hardware run-time results for the coarsegrained case study improve run-time up to 13.5% over a system with no custom instructions. 2012-12-13T21:18:58Z 2012-12-13T21:18:58Z 2012 2012-12-13 2013-05 Electronic Thesis or Dissertation http://hdl.handle.net/2429/43695 eng University of British Columbia |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
description |
Soft-core embedded systems implemented on FPGAs offer a high level of flexibility. Application specific customizations can be added in the form of extensions to the processor’s regular instruction-set. These custom instructions benefit run-time performance, but come at the cost of increased resource usage. Reducing the overall FPGA area required to implement a system will decrease static power consumption and allow a smaller, cheaper device to be used. There is a constant effort to reduce area and power consumption while maintaining performance benefits attained through customizations. This thesis presents a new architecture to share custom instruction units among multiple processors in a system. This implementation allows run-time performance benefits to be maintained while decreasing the overall resource usage. The shared architecture is implemented using an arbitrator to determine processor access to each custom instruction in a set. Custom instruction inputs and outputs are controlled using additional multiplexors and selection hardware. Results for a sample system using fine-grained custom instructions show that sharing can reduce the implementation area by up to 24% with minimal impact to the critical path delay. This reduction remains high at 19% for a coarse-grained case study of an encryption algorithm called SHA.
The custom instruction configuration depends on the application being performed. A benchmark generator and simulator are also developed to evaluate candidates for custom instruction implementation and efficiently explore the design space. The overall run-time performance of the candidate systems can also be evaluated using these tools. The simulator can also be used with an input trace to determine cycle accurate run-time performance for a real application, without requiring the entire system to be designed and implemented in hardware. The simulator shows up to 53% run-time improvement for a shared fine-grained system over a system with no custom instructions. Hardware run-time results for the coarsegrained case study improve run-time up to 13.5% over a system with no custom instructions. |
author |
Johnston, Erin |
spellingShingle |
Johnston, Erin Shared instruction-set extensions for soft multiprocessor systems implemented on field-programmable gate arrays |
author_facet |
Johnston, Erin |
author_sort |
Johnston, Erin |
title |
Shared instruction-set extensions for soft multiprocessor systems implemented on field-programmable gate arrays |
title_short |
Shared instruction-set extensions for soft multiprocessor systems implemented on field-programmable gate arrays |
title_full |
Shared instruction-set extensions for soft multiprocessor systems implemented on field-programmable gate arrays |
title_fullStr |
Shared instruction-set extensions for soft multiprocessor systems implemented on field-programmable gate arrays |
title_full_unstemmed |
Shared instruction-set extensions for soft multiprocessor systems implemented on field-programmable gate arrays |
title_sort |
shared instruction-set extensions for soft multiprocessor systems implemented on field-programmable gate arrays |
publisher |
University of British Columbia |
publishDate |
2012 |
url |
http://hdl.handle.net/2429/43695 |
work_keys_str_mv |
AT johnstonerin sharedinstructionsetextensionsforsoftmultiprocessorsystemsimplementedonfieldprogrammablegatearrays |
_version_ |
1716656556069617664 |