Optimizing network-on-chips for FPGAs

As larger System-on-Chip (SoC) designs are attempted on Field Programmable Gate Arrays (FPGAs), the need for a low cost and high performance Network-on-Chip (NoC) grows. Virtual Channel (VC) routers provide desirable traits for an NoC such as higher throughput and deadlock prevention but at signific...

Full description

Bibliographic Details
Main Author: Kwa, Jimmy Williamchingyuan
Language:English
Published: University of British Columbia 2013
Online Access:http://hdl.handle.net/2429/44343
id ndltd-UBC-oai-circle.library.ubc.ca-2429-44343
record_format oai_dc
spelling ndltd-UBC-oai-circle.library.ubc.ca-2429-443432018-01-05T17:26:35Z Optimizing network-on-chips for FPGAs Kwa, Jimmy Williamchingyuan As larger System-on-Chip (SoC) designs are attempted on Field Programmable Gate Arrays (FPGAs), the need for a low cost and high performance Network-on-Chip (NoC) grows. Virtual Channel (VC) routers provide desirable traits for an NoC such as higher throughput and deadlock prevention but at significant resource cost when implemented on an FPGA. This thesis presents an FPGA specific optimization to reduce resource utilization. We propose sharing Block RAMs between multiple router ports to store the high logic resource consuming VC buffers and present the Block RAM Split (BRS) router architecture that implements the proposed optimization. We evaluate the performance of the modifications using synthetic traffic patterns on mesh and torus networks and synthesize the NoCs to determine overall resource usage and maximum clock frequency. We find that the additional logic to support sharing Block RAMs has little impact on Adaptive Logic Module (ALM) usage in designs that currently use Block RAMs while at the same time decreasing Block RAM usage by as much as 40%. In comparison to CONNECT, a router design that does not use Block RAMs, a 71% reduction in ALM usage is shown to be possible. This resource reduction comes at the cost of a 15% reduction in the saturation throughput for uniform random traffic and a 50% decrease in the worst case neighbour traffic pattern on a mesh network. The throughput penalty from the neighbour traffic pattern can be reduced to 3% if a torus network is used. In all cases, there is little change in network latency at low load. BRS is capable of running at 161.71 MHz which is a decrease of only 4% from the base VC router design. Determining the optimum NoC topology is a challenging task. This thesis also proposes initial work towards the creation of an analytical model to assist with finding the best topology to use in an FPGA NoC. Applied Science, Faculty of Electrical and Computer Engineering, Department of Graduate 2013-04-19T20:50:58Z 2013-04-20T09:13:35Z 2013 2013-05 Text Thesis/Dissertation http://hdl.handle.net/2429/44343 eng Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ University of British Columbia
collection NDLTD
language English
sources NDLTD
description As larger System-on-Chip (SoC) designs are attempted on Field Programmable Gate Arrays (FPGAs), the need for a low cost and high performance Network-on-Chip (NoC) grows. Virtual Channel (VC) routers provide desirable traits for an NoC such as higher throughput and deadlock prevention but at significant resource cost when implemented on an FPGA. This thesis presents an FPGA specific optimization to reduce resource utilization. We propose sharing Block RAMs between multiple router ports to store the high logic resource consuming VC buffers and present the Block RAM Split (BRS) router architecture that implements the proposed optimization. We evaluate the performance of the modifications using synthetic traffic patterns on mesh and torus networks and synthesize the NoCs to determine overall resource usage and maximum clock frequency. We find that the additional logic to support sharing Block RAMs has little impact on Adaptive Logic Module (ALM) usage in designs that currently use Block RAMs while at the same time decreasing Block RAM usage by as much as 40%. In comparison to CONNECT, a router design that does not use Block RAMs, a 71% reduction in ALM usage is shown to be possible. This resource reduction comes at the cost of a 15% reduction in the saturation throughput for uniform random traffic and a 50% decrease in the worst case neighbour traffic pattern on a mesh network. The throughput penalty from the neighbour traffic pattern can be reduced to 3% if a torus network is used. In all cases, there is little change in network latency at low load. BRS is capable of running at 161.71 MHz which is a decrease of only 4% from the base VC router design. Determining the optimum NoC topology is a challenging task. This thesis also proposes initial work towards the creation of an analytical model to assist with finding the best topology to use in an FPGA NoC. === Applied Science, Faculty of === Electrical and Computer Engineering, Department of === Graduate
author Kwa, Jimmy Williamchingyuan
spellingShingle Kwa, Jimmy Williamchingyuan
Optimizing network-on-chips for FPGAs
author_facet Kwa, Jimmy Williamchingyuan
author_sort Kwa, Jimmy Williamchingyuan
title Optimizing network-on-chips for FPGAs
title_short Optimizing network-on-chips for FPGAs
title_full Optimizing network-on-chips for FPGAs
title_fullStr Optimizing network-on-chips for FPGAs
title_full_unstemmed Optimizing network-on-chips for FPGAs
title_sort optimizing network-on-chips for fpgas
publisher University of British Columbia
publishDate 2013
url http://hdl.handle.net/2429/44343
work_keys_str_mv AT kwajimmywilliamchingyuan optimizingnetworkonchipsforfpgas
_version_ 1718583793350082560