Fast and Scalable Simulation Framework for Large In-Order Chip Multiprocessors

As chip technology advances, the number of cores in mainstream chip multiprocessors (CMP) increases, so chips with hundreds of cores may become common within a decade. One of the challenges this trend sets to computer architects is to make the current CMP designs scalable to larger numbers of cores....

Full description

Bibliographic Details
Main Author: Yuri Nedbailo
Format: Article
Language:English
Published: FRUCT 2020-04-01
Series:Proceedings of the XXth Conference of Open Innovations Association FRUCT
Subjects:
Online Access:https://www.fruct.org/publications/fruct26/files/Ned.pdf
id doaj-7482325596c74ed796080abb9738a22b
record_format Article
spelling doaj-7482325596c74ed796080abb9738a22b2020-11-25T03:38:31ZengFRUCTProceedings of the XXth Conference of Open Innovations Association FRUCT2305-72542343-07372020-04-0126133534510.23919/FRUCT48808.2020.9087481Fast and Scalable Simulation Framework for Large In-Order Chip MultiprocessorsYuri Nedbailo0MCST, RussiaAs chip technology advances, the number of cores in mainstream chip multiprocessors (CMP) increases, so chips with hundreds of cores may become common within a decade. One of the challenges this trend sets to computer architects is to make the current CMP designs scalable to larger numbers of cores. A tool set that would allow us to predict how various design decisions may affect the performance of larger CMPs is therefore necessary. In this paper, we present a trace-based simulation framework we devised for Elbrus microprocessor family. Its core component, the CMP simulator is scalable to at least one thousand of cores and allows to evaluate the kilo-core CMP performance in just a few days using a mainstream 16-core host computer. It is also highly flexible and architecture-agnostic and, therefore, could be used to simulate other in-order architectures. We validated the framework against a real machine and achieved an average accuracy of 18 percent in single-core tests and 15 percent in four-core, an average error in relative slowdown evaluation of 2.6 percent, and average absolute errors in L2 and L3 cache miss rates within 0.3 bytes per cycle.https://www.fruct.org/publications/fruct26/files/Ned.pdfchip multi-processorstrace-based simulationkilo-core
collection DOAJ
language English
format Article
sources DOAJ
author Yuri Nedbailo
spellingShingle Yuri Nedbailo
Fast and Scalable Simulation Framework for Large In-Order Chip Multiprocessors
Proceedings of the XXth Conference of Open Innovations Association FRUCT
chip multi-processors
trace-based simulation
kilo-core
author_facet Yuri Nedbailo
author_sort Yuri Nedbailo
title Fast and Scalable Simulation Framework for Large In-Order Chip Multiprocessors
title_short Fast and Scalable Simulation Framework for Large In-Order Chip Multiprocessors
title_full Fast and Scalable Simulation Framework for Large In-Order Chip Multiprocessors
title_fullStr Fast and Scalable Simulation Framework for Large In-Order Chip Multiprocessors
title_full_unstemmed Fast and Scalable Simulation Framework for Large In-Order Chip Multiprocessors
title_sort fast and scalable simulation framework for large in-order chip multiprocessors
publisher FRUCT
series Proceedings of the XXth Conference of Open Innovations Association FRUCT
issn 2305-7254
2343-0737
publishDate 2020-04-01
description As chip technology advances, the number of cores in mainstream chip multiprocessors (CMP) increases, so chips with hundreds of cores may become common within a decade. One of the challenges this trend sets to computer architects is to make the current CMP designs scalable to larger numbers of cores. A tool set that would allow us to predict how various design decisions may affect the performance of larger CMPs is therefore necessary. In this paper, we present a trace-based simulation framework we devised for Elbrus microprocessor family. Its core component, the CMP simulator is scalable to at least one thousand of cores and allows to evaluate the kilo-core CMP performance in just a few days using a mainstream 16-core host computer. It is also highly flexible and architecture-agnostic and, therefore, could be used to simulate other in-order architectures. We validated the framework against a real machine and achieved an average accuracy of 18 percent in single-core tests and 15 percent in four-core, an average error in relative slowdown evaluation of 2.6 percent, and average absolute errors in L2 and L3 cache miss rates within 0.3 bytes per cycle.
topic chip multi-processors
trace-based simulation
kilo-core
url https://www.fruct.org/publications/fruct26/files/Ned.pdf
work_keys_str_mv AT yurinedbailo fastandscalablesimulationframeworkforlargeinorderchipmultiprocessors
_version_ 1724541961680977920