Optimizing network-on-chips for FPGAs

As larger System-on-Chip (SoC) designs are attempted on Field Programmable Gate Arrays (FPGAs), the need for a low cost and high performance Network-on-Chip (NoC) grows. Virtual Channel (VC) routers provide desirable traits for an NoC such as higher throughput and deadlock prevention but at signific...

Full description

Bibliographic Details
Main Author:	Kwa, Jimmy Williamchingyuan
Language:	English
Published:	University of British Columbia 2013
Online Access:	http://hdl.handle.net/2429/44343

id	ndltd-UBC-oai-circle.library.ubc.ca-2429-44343
record_format	oai_dc
spelling	ndltd-UBC-oai-circle.library.ubc.ca-2429-443432018-01-05T17:26:35Z Optimizing network-on-chips for FPGAs Kwa, Jimmy Williamchingyuan As larger System-on-Chip (SoC) designs are attempted on Field Programmable Gate Arrays (FPGAs), the need for a low cost and high performance Network-on-Chip (NoC) grows. Virtual Channel (VC) routers provide desirable traits for an NoC such as higher throughput and deadlock prevention but at significant resource cost when implemented on an FPGA. This thesis presents an FPGA specific optimization to reduce resource utilization. We propose sharing Block RAMs between multiple router ports to store the high logic resource consuming VC buffers and present the Block RAM Split (BRS) router architecture that implements the proposed optimization. We evaluate the performance of the modifications using synthetic traffic patterns on mesh and torus networks and synthesize the NoCs to determine overall resource usage and maximum clock frequency. We find that the additional logic to support sharing Block RAMs has little impact on Adaptive Logic Module (ALM) usage in designs that currently use Block RAMs while at the same time decreasing Block RAM usage by as much as 40%. In comparison to CONNECT, a router design that does not use Block RAMs, a 71% reduction in ALM usage is shown to be possible. This resource reduction comes at the cost of a 15% reduction in the saturation throughput for uniform random traffic and a 50% decrease in the worst case neighbour traffic pattern on a mesh network. The throughput penalty from the neighbour traffic pattern can be reduced to 3% if a torus network is used. In all cases, there is little change in network latency at low load. BRS is capable of running at 161.71 MHz which is a decrease of only 4% from the base VC router design. Determining the optimum NoC topology is a challenging task. This thesis also proposes initial work towards the creation of an analytical model to assist with finding the best topology to use in an FPGA NoC. Applied Science, Faculty of Electrical and Computer Engineering, Department of Graduate 2013-04-19T20:50:58Z 2013-04-20T09:13:35Z 2013 2013-05 Text Thesis/Dissertation http://hdl.handle.net/2429/44343 eng Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ University of British Columbia
collection	NDLTD
language	English
sources	NDLTD
description	As larger System-on-Chip (SoC) designs are attempted on Field Programmable Gate Arrays (FPGAs), the need for a low cost and high performance Network-on-Chip (NoC) grows. Virtual Channel (VC) routers provide desirable traits for an NoC such as higher throughput and deadlock prevention but at significant resource cost when implemented on an FPGA. This thesis presents an FPGA specific optimization to reduce resource utilization. We propose sharing Block RAMs between multiple router ports to store the high logic resource consuming VC buffers and present the Block RAM Split (BRS) router architecture that implements the proposed optimization. We evaluate the performance of the modifications using synthetic traffic patterns on mesh and torus networks and synthesize the NoCs to determine overall resource usage and maximum clock frequency. We find that the additional logic to support sharing Block RAMs has little impact on Adaptive Logic Module (ALM) usage in designs that currently use Block RAMs while at the same time decreasing Block RAM usage by as much as 40%. In comparison to CONNECT, a router design that does not use Block RAMs, a 71% reduction in ALM usage is shown to be possible. This resource reduction comes at the cost of a 15% reduction in the saturation throughput for uniform random traffic and a 50% decrease in the worst case neighbour traffic pattern on a mesh network. The throughput penalty from the neighbour traffic pattern can be reduced to 3% if a torus network is used. In all cases, there is little change in network latency at low load. BRS is capable of running at 161.71 MHz which is a decrease of only 4% from the base VC router design. Determining the optimum NoC topology is a challenging task. This thesis also proposes initial work towards the creation of an analytical model to assist with finding the best topology to use in an FPGA NoC. === Applied Science, Faculty of === Electrical and Computer Engineering, Department of === Graduate
author	Kwa, Jimmy Williamchingyuan
spellingShingle	Kwa, Jimmy Williamchingyuan Optimizing network-on-chips for FPGAs
author_facet	Kwa, Jimmy Williamchingyuan
author_sort	Kwa, Jimmy Williamchingyuan
title	Optimizing network-on-chips for FPGAs
title_short	Optimizing network-on-chips for FPGAs
title_full	Optimizing network-on-chips for FPGAs
title_fullStr	Optimizing network-on-chips for FPGAs
title_full_unstemmed	Optimizing network-on-chips for FPGAs
title_sort	optimizing network-on-chips for fpgas
publisher	University of British Columbia
publishDate	2013
url	http://hdl.handle.net/2429/44343
work_keys_str_mv	AT kwajimmywilliamchingyuan optimizingnetworkonchipsforfpgas
_version_	1718583793350082560

Optimizing network-on-chips for FPGAs

Similar Items