Reusing cached schedules in an out-of-order processor with in-order issue logic
Modern processors use out-of-order processing logic to achieve high performance in Instructions Per Cycle (IPC) but this logic has a serious impact on the achievable frequency. In order to get better performance out of smaller transistors there is a trend to increase the number of cores per die inst...
Main Author: | |
---|---|
Other Authors: | |
Format: | Doctoral Thesis |
Language: | English |
Published: |
Universitat Politècnica de Catalunya
2011
|
Subjects: | |
Online Access: | http://hdl.handle.net/10803/80536 |
id |
ndltd-TDX_UPC-oai-www.tdx.cat-10803-80536 |
---|---|
record_format |
oai_dc |
collection |
NDLTD |
language |
English |
format |
Doctoral Thesis |
sources |
NDLTD |
topic |
Issue logic In-order processor Out-of-order processor 004 |
spellingShingle |
Issue logic In-order processor Out-of-order processor 004 Palomar Pérez, Óscar Reusing cached schedules in an out-of-order processor with in-order issue logic |
description |
Modern processors use out-of-order processing logic to achieve high performance in Instructions Per Cycle (IPC) but this logic has a serious impact on
the achievable frequency. In order to get better performance out of smaller transistors there is a trend to increase the number of cores per die instead of
making the cores themselves bigger. Moreover, for throughput-oriented and server workloads, simpler in-order processors that allow more cores per die
and higher design frequencies are becoming the preferred choice. Unfortunately, for other workloads this type of cores result in a lower single thread
performance.
There are many workloads where it is still important to achieve good single thread performance. In this thesis we present the ReLaSch processor.
Its aim is to enable high IPC cores capable of running at high clock frequencies by processing the instructions using simple superscalar in-order issue
logic and caching instruction groups that are dynamically scheduled in hardware after commit, that is, out of the critical path and only when really
needed.
Objective
This thesis has several research goals:
• Show that the dynamic scheduler of a conventional out-of-order processor does a lot of redundant work because it ignores the
repetitiveness of code.
• Propose a complete superscalar out-of-order architecture that reduces the amount of redundant work done by creating the
schedules once in dedicated hardware, storing them in a cache of schedules and reusing the schedules as much as possible.
• Place the scheduler out of the critical path of execution, which should be enabled by the reduction of work that it must do. Thus,
the execution path of our proposed processor can be simpler than that of a conventional out-of-order processor.
Proposal and results
We present the \textbf{ReLaSch} processor, named after Reused Late Schedules, in which the creation of issue-groups is removed from the critical
path of execution and uses a simple and small in-order issue logic. It just wakes-up and selects the instructions of a single issue-group each cycle,
instead of processing the instructions of a whole issue queue.
A new logic at the end of the conventional pipeline schedules the committed instructions. The new scheduler can be complex since it is not in the critical
path of execution. The schedules are cached and whenever it is possible an rgroup is read and its instructions executed. The schedules are reused,
lowering the pressure on the scheduling logic.
In some cases, the ReLaSch processor is able to outperform a conventional out-of-order processor, because the post-commit scheduler has a broader
vision of the code. For instance, while ReLaSch can schedule together two independent instructions that are distant in the code, a conventional out-oforder
processor only issues them in the same cycle if both are in-flight.
The ReLaSch processor predicts the branch targets, memory aliases and latencies at scheduling time, out of the critical path. The prediction is based
on the most recent executions at scheduling time. Furthermore, most of the register renaming process is performed by the scheduler and is removed
from the execution pipeline.
Our experiments show that ReLaSch has the same average IPC as our reference out-of-order processor and is clearly better than the reference inorder
processor (1.55 speed-up). In all cases it outperforms the in-order processor and in 23 benchmarks out of 40 it has a higher IPC than the
reference out-of-order processor. |
author2 |
Navarro, Juan J. (Juan José) |
author_facet |
Navarro, Juan J. (Juan José) Palomar Pérez, Óscar |
author |
Palomar Pérez, Óscar |
author_sort |
Palomar Pérez, Óscar |
title |
Reusing cached schedules in an out-of-order processor with in-order issue logic |
title_short |
Reusing cached schedules in an out-of-order processor with in-order issue logic |
title_full |
Reusing cached schedules in an out-of-order processor with in-order issue logic |
title_fullStr |
Reusing cached schedules in an out-of-order processor with in-order issue logic |
title_full_unstemmed |
Reusing cached schedules in an out-of-order processor with in-order issue logic |
title_sort |
reusing cached schedules in an out-of-order processor with in-order issue logic |
publisher |
Universitat Politècnica de Catalunya |
publishDate |
2011 |
url |
http://hdl.handle.net/10803/80536 |
work_keys_str_mv |
AT palomarperezoscar reusingcachedschedulesinanoutoforderprocessorwithinorderissuelogic |
_version_ |
1716592616023261184 |
spelling |
ndltd-TDX_UPC-oai-www.tdx.cat-10803-805362013-07-11T03:41:02ZReusing cached schedules in an out-of-order processor with in-order issue logicPalomar Pérez, ÓscarIssue logicIn-order processorOut-of-order processor004Modern processors use out-of-order processing logic to achieve high performance in Instructions Per Cycle (IPC) but this logic has a serious impact on the achievable frequency. In order to get better performance out of smaller transistors there is a trend to increase the number of cores per die instead of making the cores themselves bigger. Moreover, for throughput-oriented and server workloads, simpler in-order processors that allow more cores per die and higher design frequencies are becoming the preferred choice. Unfortunately, for other workloads this type of cores result in a lower single thread performance. There are many workloads where it is still important to achieve good single thread performance. In this thesis we present the ReLaSch processor. Its aim is to enable high IPC cores capable of running at high clock frequencies by processing the instructions using simple superscalar in-order issue logic and caching instruction groups that are dynamically scheduled in hardware after commit, that is, out of the critical path and only when really needed. Objective This thesis has several research goals: • Show that the dynamic scheduler of a conventional out-of-order processor does a lot of redundant work because it ignores the repetitiveness of code. • Propose a complete superscalar out-of-order architecture that reduces the amount of redundant work done by creating the schedules once in dedicated hardware, storing them in a cache of schedules and reusing the schedules as much as possible. • Place the scheduler out of the critical path of execution, which should be enabled by the reduction of work that it must do. Thus, the execution path of our proposed processor can be simpler than that of a conventional out-of-order processor. Proposal and results We present the \textbf{ReLaSch} processor, named after Reused Late Schedules, in which the creation of issue-groups is removed from the critical path of execution and uses a simple and small in-order issue logic. It just wakes-up and selects the instructions of a single issue-group each cycle, instead of processing the instructions of a whole issue queue. A new logic at the end of the conventional pipeline schedules the committed instructions. The new scheduler can be complex since it is not in the critical path of execution. The schedules are cached and whenever it is possible an rgroup is read and its instructions executed. The schedules are reused, lowering the pressure on the scheduling logic. In some cases, the ReLaSch processor is able to outperform a conventional out-of-order processor, because the post-commit scheduler has a broader vision of the code. For instance, while ReLaSch can schedule together two independent instructions that are distant in the code, a conventional out-oforder processor only issues them in the same cycle if both are in-flight. The ReLaSch processor predicts the branch targets, memory aliases and latencies at scheduling time, out of the critical path. The prediction is based on the most recent executions at scheduling time. Furthermore, most of the register renaming process is performed by the scheduler and is removed from the execution pipeline. Our experiments show that ReLaSch has the same average IPC as our reference out-of-order processor and is clearly better than the reference inorder processor (1.55 speed-up). In all cases it outperforms the in-order processor and in 23 benchmarks out of 40 it has a higher IPC than the reference out-of-order processor.Universitat Politècnica de CatalunyaNavarro, Juan J. (Juan José)Hormigo, Antonio JuanUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors2011-05-09info:eu-repo/semantics/doctoralThesisinfo:eu-repo/semantics/publishedVersion204 p.application/pdfhttp://hdl.handle.net/10803/80536B. 17062-2012TDX (Tesis Doctorals en Xarxa)enginfo:eu-repo/semantics/openAccessADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs. |