Architecture of block-RAM-based massively parallel memory structures : multi-ported memories and content-addressable memories

Since they were first introduced three decades ago, Field-Programmable Gate Arrays (FPGAs) have evolved from being merely used as glue-logic to implementing entire compute accelerators. These massively parallel systems demand highly parallel memory structures to keep pace with their concurrent natur...

Full description

Bibliographic Details
Main Author: Abdelhadi, Ameer M. S.
Language:English
Published: University of British Columbia 2016
Online Access:http://hdl.handle.net/2429/59146
Description
Summary:Since they were first introduced three decades ago, Field-Programmable Gate Arrays (FPGAs) have evolved from being merely used as glue-logic to implementing entire compute accelerators. These massively parallel systems demand highly parallel memory structures to keep pace with their concurrent nature since memories are usually the bottleneck of computation performance. However, the vast majority of FPGA devices provide dual-ported SRAM blocks only. In this dissertation, we propose new ways to build area-efficient, high-performance SRAM-based parallel memory structures in FPGAs, specifically Multi-Ported Random Access Memory and Content-Addressable Memory (CAM). While parallel computation demands more RAM ports, leading Multi-Ported Random Access Memory techniques in FPGAs have relatively large overhead in resource usage. As a result, we have produced new design techniques that are near-optimal in resource overhead and have several practical advantages. The suggested method reduces RAM usage by over 44% and improves clock speed by over 76% compared to the best of previous approaches. Furthermore, we propose a novel switched-ports technique that allows further area reduction if some RAM ports are not simultaneously active. A memory compiler is proposed to generalize the previous approach and allow generating Multi-Switched-Ports Random Access Memory. Content-Addressable Memories (CAMs), the hardware implementation of associative arrays, are capable of searching the entire memory space for a specific value within a single clock cycle. CAMs are massively parallel search engines accessing all memory content to compare with the searched pattern simultaneously. CAMs are used in a variety of scientific fields requiring high-speed associative searches. Despite their importance, FPGAs lack an area-efficient CAM implementation. We propose a series of scalable, area-efficient, and high-performance Binary Content-Addressable Memories (BCAMs) based on hierarchical search and data compression methods. Compared to current RAM-based BCAM architectures, our BCAMs require a maximum of 18% the RAM storage while enhancing clock speed by 45% on average, hence exhibiting a superior single-cycle search rate. As a result, we can build faster and more cost-effective accelerators to solve some of the most important computational problems. === Applied Science, Faculty of === Electrical and Computer Engineering, Department of === Graduate