Implementation of coarse-grain coherence tracking support in ring-based multiprocessors

As the number of processors in multiprocessor system-on-chip devices continues to increase, the complexity required for full cache coherence support is often unwarranted for application-specific designs. Bus-based interconnects are no longer suitable for larger-scale systems, and the logic and stora...

Full description

Bibliographic Details
Main Author: Coté, Edmond A.
Other Authors: Queen's University (Kingston, Ont.). Theses (Queen's University (Kingston, Ont.))
Format: Others
Language:en
en
Published: 2007
Subjects:
Online Access:http://hdl.handle.net/1974/882
id ndltd-LACETR-oai-collectionscanada.gc.ca-OKQ.1974-882
record_format oai_dc
spelling ndltd-LACETR-oai-collectionscanada.gc.ca-OKQ.1974-8822013-12-20T03:38:35ZImplementation of coarse-grain coherence tracking support in ring-based multiprocessorsCoté, Edmond A.Cache coherenceRing-based multiprocessorCoarse-grain coherence trackingPrototype implementationMultiprocessor system-on-chipAs the number of processors in multiprocessor system-on-chip devices continues to increase, the complexity required for full cache coherence support is often unwarranted for application-specific designs. Bus-based interconnects are no longer suitable for larger-scale systems, and the logic and storage overhead associated with the use of a complex packet-switched network and directory-based cache coherence may be undesirable in single-chip systems. Unidirectional rings are a suitable alternative because they offer many properties favorable to both on-chip implementation and to supporting cache coherence. Reducing the overhead of cache coherence traffic is, however, a concern for these systems. This thesis adapts two filter structures that are based on principles of coarse-grained coherence tracking, and applies them to a ring-based multiprocessor. The first structure tracks the total number of blocks of remote data cached by all processors in a node for a set of regions, where a region is a large area of memory referenced by the upper bits of an address. The second structure records regions of local data whose contents are not cached by any remote node. When used together to filter incoming or outgoing requests, these structures reduce the extent of coherence traffic and limit the transmission of coherent requests to the necessary parts of the system. A complete single-chip multiprocessor system that includes the proposed filters is designed and implemented in programmable logic for this thesis. The system is composed of nodes of bus-based multiprocessors, and each node includes a common memory, two or more pipelined 32-bit processors with coherent data caches, a split-transaction bus with separate lines for requests and responses, and an interface for the system-level ring interconnect. Two coarse-grained filters are attached to each node to reduce the impact of coherence traffic on the system. Cache coherence within the node is enforced through bus snooping, while coherence across the interconnect is supported by a reduced-complexity ring snooping protocol. Main memory is globally shared and is physically distributed among the nodes. Results are presented to highlight the system's key implementation points. Synthesis results are presented in order to evaluate hardware overhead, and operational results are shown to demonstrate the functionality of the multiprocessor system and of the filter structures.Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2007-10-24 10:16:47.81Financial support for this work was provided by the National Sciences and Engineering Research Council of Canada, Communications and Information Technology Ontario, and Queen's University.Queen's University (Kingston, Ont.). Theses (Queen's University (Kingston, Ont.))2007-10-24 10:16:47.812007-10-25T15:35:54Z2007-10-25T15:35:54Z2007-10-25T15:35:54ZThesis2463208 bytesapplication/pdfhttp://hdl.handle.net/1974/882enenCanadian thesesThis publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
collection NDLTD
language en
en
format Others
sources NDLTD
topic Cache coherence
Ring-based multiprocessor
Coarse-grain coherence tracking
Prototype implementation
Multiprocessor system-on-chip
spellingShingle Cache coherence
Ring-based multiprocessor
Coarse-grain coherence tracking
Prototype implementation
Multiprocessor system-on-chip
Coté, Edmond A.
Implementation of coarse-grain coherence tracking support in ring-based multiprocessors
description As the number of processors in multiprocessor system-on-chip devices continues to increase, the complexity required for full cache coherence support is often unwarranted for application-specific designs. Bus-based interconnects are no longer suitable for larger-scale systems, and the logic and storage overhead associated with the use of a complex packet-switched network and directory-based cache coherence may be undesirable in single-chip systems. Unidirectional rings are a suitable alternative because they offer many properties favorable to both on-chip implementation and to supporting cache coherence. Reducing the overhead of cache coherence traffic is, however, a concern for these systems. This thesis adapts two filter structures that are based on principles of coarse-grained coherence tracking, and applies them to a ring-based multiprocessor. The first structure tracks the total number of blocks of remote data cached by all processors in a node for a set of regions, where a region is a large area of memory referenced by the upper bits of an address. The second structure records regions of local data whose contents are not cached by any remote node. When used together to filter incoming or outgoing requests, these structures reduce the extent of coherence traffic and limit the transmission of coherent requests to the necessary parts of the system. A complete single-chip multiprocessor system that includes the proposed filters is designed and implemented in programmable logic for this thesis. The system is composed of nodes of bus-based multiprocessors, and each node includes a common memory, two or more pipelined 32-bit processors with coherent data caches, a split-transaction bus with separate lines for requests and responses, and an interface for the system-level ring interconnect. Two coarse-grained filters are attached to each node to reduce the impact of coherence traffic on the system. Cache coherence within the node is enforced through bus snooping, while coherence across the interconnect is supported by a reduced-complexity ring snooping protocol. Main memory is globally shared and is physically distributed among the nodes. Results are presented to highlight the system's key implementation points. Synthesis results are presented in order to evaluate hardware overhead, and operational results are shown to demonstrate the functionality of the multiprocessor system and of the filter structures. === Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2007-10-24 10:16:47.81 === Financial support for this work was provided by the National Sciences and Engineering Research Council of Canada, Communications and Information Technology Ontario, and Queen's University.
author2 Queen's University (Kingston, Ont.). Theses (Queen's University (Kingston, Ont.))
author_facet Queen's University (Kingston, Ont.). Theses (Queen's University (Kingston, Ont.))
Coté, Edmond A.
author Coté, Edmond A.
author_sort Coté, Edmond A.
title Implementation of coarse-grain coherence tracking support in ring-based multiprocessors
title_short Implementation of coarse-grain coherence tracking support in ring-based multiprocessors
title_full Implementation of coarse-grain coherence tracking support in ring-based multiprocessors
title_fullStr Implementation of coarse-grain coherence tracking support in ring-based multiprocessors
title_full_unstemmed Implementation of coarse-grain coherence tracking support in ring-based multiprocessors
title_sort implementation of coarse-grain coherence tracking support in ring-based multiprocessors
publishDate 2007
url http://hdl.handle.net/1974/882
work_keys_str_mv AT coteedmonda implementationofcoarsegraincoherencetrackingsupportinringbasedmultiprocessors
_version_ 1716620783789277184