Optimizing network-on-chips for FPGAs
As larger System-on-Chip (SoC) designs are attempted on Field Programmable Gate Arrays (FPGAs), the need for a low cost and high performance Network-on-Chip (NoC) grows. Virtual Channel (VC) routers provide desirable traits for an NoC such as higher throughput and deadlock prevention but at signific...
Main Author: | |
---|---|
Language: | English |
Published: |
University of British Columbia
2013
|
Online Access: | http://hdl.handle.net/2429/44343 |
id |
ndltd-UBC-oai-circle.library.ubc.ca-2429-44343 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UBC-oai-circle.library.ubc.ca-2429-443432018-01-05T17:26:35Z Optimizing network-on-chips for FPGAs Kwa, Jimmy Williamchingyuan As larger System-on-Chip (SoC) designs are attempted on Field Programmable Gate Arrays (FPGAs), the need for a low cost and high performance Network-on-Chip (NoC) grows. Virtual Channel (VC) routers provide desirable traits for an NoC such as higher throughput and deadlock prevention but at significant resource cost when implemented on an FPGA. This thesis presents an FPGA specific optimization to reduce resource utilization. We propose sharing Block RAMs between multiple router ports to store the high logic resource consuming VC buffers and present the Block RAM Split (BRS) router architecture that implements the proposed optimization. We evaluate the performance of the modifications using synthetic traffic patterns on mesh and torus networks and synthesize the NoCs to determine overall resource usage and maximum clock frequency. We find that the additional logic to support sharing Block RAMs has little impact on Adaptive Logic Module (ALM) usage in designs that currently use Block RAMs while at the same time decreasing Block RAM usage by as much as 40%. In comparison to CONNECT, a router design that does not use Block RAMs, a 71% reduction in ALM usage is shown to be possible. This resource reduction comes at the cost of a 15% reduction in the saturation throughput for uniform random traffic and a 50% decrease in the worst case neighbour traffic pattern on a mesh network. The throughput penalty from the neighbour traffic pattern can be reduced to 3% if a torus network is used. In all cases, there is little change in network latency at low load. BRS is capable of running at 161.71 MHz which is a decrease of only 4% from the base VC router design. Determining the optimum NoC topology is a challenging task. This thesis also proposes initial work towards the creation of an analytical model to assist with finding the best topology to use in an FPGA NoC. Applied Science, Faculty of Electrical and Computer Engineering, Department of Graduate 2013-04-19T20:50:58Z 2013-04-20T09:13:35Z 2013 2013-05 Text Thesis/Dissertation http://hdl.handle.net/2429/44343 eng Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ University of British Columbia |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
description |
As larger System-on-Chip (SoC) designs are attempted on Field Programmable Gate Arrays (FPGAs), the need for a low cost and high performance Network-on-Chip (NoC) grows. Virtual Channel (VC) routers provide desirable traits for an NoC such as higher throughput and deadlock prevention but at significant resource cost when implemented on an FPGA. This thesis presents an FPGA specific optimization to reduce resource utilization. We propose sharing Block RAMs between multiple router ports to store the high logic resource consuming VC buffers and present the Block RAM Split (BRS) router architecture that implements the proposed optimization. We evaluate the performance of the modifications using synthetic traffic patterns on mesh and torus networks and synthesize the NoCs to determine overall resource usage and maximum clock frequency. We find that the additional logic to support sharing Block RAMs has little impact on Adaptive Logic Module (ALM) usage in designs that currently use Block RAMs while at the same time decreasing Block RAM usage by as much as 40%. In comparison to CONNECT, a router design that does not use Block RAMs, a 71% reduction in ALM usage is shown to be possible. This resource reduction comes at the cost of a 15% reduction in the saturation throughput for uniform random traffic and a 50% decrease in the worst case neighbour traffic pattern on a mesh network. The throughput penalty from the neighbour traffic pattern can be reduced to 3% if a torus network is used. In all cases, there is little change in network latency at low load. BRS is capable of running at 161.71 MHz which is a decrease of only 4% from the base VC router design. Determining the optimum NoC topology is a challenging task. This thesis also proposes initial work towards the creation of an analytical model to assist with finding the best topology to use in an FPGA NoC. === Applied Science, Faculty of === Electrical and Computer Engineering, Department of === Graduate |
author |
Kwa, Jimmy Williamchingyuan |
spellingShingle |
Kwa, Jimmy Williamchingyuan Optimizing network-on-chips for FPGAs |
author_facet |
Kwa, Jimmy Williamchingyuan |
author_sort |
Kwa, Jimmy Williamchingyuan |
title |
Optimizing network-on-chips for FPGAs |
title_short |
Optimizing network-on-chips for FPGAs |
title_full |
Optimizing network-on-chips for FPGAs |
title_fullStr |
Optimizing network-on-chips for FPGAs |
title_full_unstemmed |
Optimizing network-on-chips for FPGAs |
title_sort |
optimizing network-on-chips for fpgas |
publisher |
University of British Columbia |
publishDate |
2013 |
url |
http://hdl.handle.net/2429/44343 |
work_keys_str_mv |
AT kwajimmywilliamchingyuan optimizingnetworkonchipsforfpgas |
_version_ |
1718583793350082560 |