Hiding and Reducing Memory Latency : Energy-Efficient Pipeline and Memory System Techniques

Memory accesses in modern processors are both far slower and vastly more energy-expensive than the actual computations. To improve performance, processors spend a significant amount of energy and resources trying to hide and reduce the memory latency. To hide the latency, processors use out-order-or...

Full description

Bibliographic Details
Main Author:	Sembrant, Andreas
Format:	Doctoral Thesis
Language:	English
Published:	Uppsala universitet, Avdelningen för datorteknik 2016
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-306369 http://nbn-resolving.de/urn:isbn:978-91-554-9744-6

id	ndltd-UPSALLA1-oai-DiVA.org-uu-306369
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-uu-3063692016-11-29T05:58:21ZHiding and Reducing Memory Latency : Energy-Efficient Pipeline and Memory System TechniquesengSembrant, AndreasUppsala universitet, Avdelningen för datorteknikUppsala universitet, DatorteknikUppsala2016Memory accesses in modern processors are both far slower and vastly more energy-expensive than the actual computations. To improve performance, processors spend a significant amount of energy and resources trying to hide and reduce the memory latency. To hide the latency, processors use out-order-order execution to overlap memory accesses with independent work and aggressive speculative instruction scheduling to execute dependent instructions back-to-back. To reduce the latency, processors use several levels of caching that keep frequently used data closer to the processor. However, these optimizations are not for free. Out-of-order execution requires expensive processor resources, and speculative scheduling must re-execute instructions on incorrect speculations, and multi-level caching requires extra energy and latency to search the cache hierarchy. This thesis investigates several energy-efficient techniques for: 1) hiding the latency in the processor pipeline, and 2) reducing the latency in the memory hierarchy. Much of the inefficiencies of hiding latency in the processor come from two sources. First, processors need several large and expensive structures to do out-of-order execution (instructions queue, register file, etc.). These resources are typically allocated in program order, effectively giving all instructions equal priority. To reduce the size of these expensive resources without hurting performance, we propose Long Term Parking (LTP). LTP parks non-critical instructions before they allocate resources, thereby making room for critical memory accessing instructions to continue and expose more memory-level parallelism. This enables us to save energy by shrinking the resources sizes without hurting performance. Second, when a load's data returns, the load's dependent instructions need to be scheduled and executed. To execute the dependent instructions back-to-back, the processor will speculatively schedule instructions before the processor knows if the input data will be available at execution time. To save energy, we investigate different scheduling techniques that reduce the number of re-executions due to misspeculation. The inefficiencies of traditional memory hierarchies come from the need to do level-by-level searches to locate data. The search starts at the L1 cache, then proceeds level by level until the data is found, or determined not to be in any cache, at which point the processor has to fetch the data from main memory. This wastes time and energy for every level that is searched. To reduce the latency, we propose tracking the location of the data directly in a separate metadata hierarchy. This allows us to directly access the data without needing to search. The processor simply queries the metadata hierarchy for the location information about where the data is stored. Separating metadata into its own hierarchy brings a wide range of additional benefits, including flexibility in how we place data storages in the hierarchy, the ability to intelligently store data in the hierarchy, direct access to remote cores, and many other data-oriented optimizations that can leverage our precise knowledge of where data are located. Doctoral thesis, comprehensive summaryinfo:eu-repo/semantics/doctoralThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-306369urn:isbn:978-91-554-9744-6Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, 1651-6214 ; 1450application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Doctoral Thesis
sources	NDLTD
description	Memory accesses in modern processors are both far slower and vastly more energy-expensive than the actual computations. To improve performance, processors spend a significant amount of energy and resources trying to hide and reduce the memory latency. To hide the latency, processors use out-order-order execution to overlap memory accesses with independent work and aggressive speculative instruction scheduling to execute dependent instructions back-to-back. To reduce the latency, processors use several levels of caching that keep frequently used data closer to the processor. However, these optimizations are not for free. Out-of-order execution requires expensive processor resources, and speculative scheduling must re-execute instructions on incorrect speculations, and multi-level caching requires extra energy and latency to search the cache hierarchy. This thesis investigates several energy-efficient techniques for: 1) hiding the latency in the processor pipeline, and 2) reducing the latency in the memory hierarchy. Much of the inefficiencies of hiding latency in the processor come from two sources. First, processors need several large and expensive structures to do out-of-order execution (instructions queue, register file, etc.). These resources are typically allocated in program order, effectively giving all instructions equal priority. To reduce the size of these expensive resources without hurting performance, we propose Long Term Parking (LTP). LTP parks non-critical instructions before they allocate resources, thereby making room for critical memory accessing instructions to continue and expose more memory-level parallelism. This enables us to save energy by shrinking the resources sizes without hurting performance. Second, when a load's data returns, the load's dependent instructions need to be scheduled and executed. To execute the dependent instructions back-to-back, the processor will speculatively schedule instructions before the processor knows if the input data will be available at execution time. To save energy, we investigate different scheduling techniques that reduce the number of re-executions due to misspeculation. The inefficiencies of traditional memory hierarchies come from the need to do level-by-level searches to locate data. The search starts at the L1 cache, then proceeds level by level until the data is found, or determined not to be in any cache, at which point the processor has to fetch the data from main memory. This wastes time and energy for every level that is searched. To reduce the latency, we propose tracking the location of the data directly in a separate metadata hierarchy. This allows us to directly access the data without needing to search. The processor simply queries the metadata hierarchy for the location information about where the data is stored. Separating metadata into its own hierarchy brings a wide range of additional benefits, including flexibility in how we place data storages in the hierarchy, the ability to intelligently store data in the hierarchy, direct access to remote cores, and many other data-oriented optimizations that can leverage our precise knowledge of where data are located.
author	Sembrant, Andreas
spellingShingle	Sembrant, Andreas Hiding and Reducing Memory Latency : Energy-Efficient Pipeline and Memory System Techniques
author_facet	Sembrant, Andreas
author_sort	Sembrant, Andreas
title	Hiding and Reducing Memory Latency : Energy-Efficient Pipeline and Memory System Techniques
title_short	Hiding and Reducing Memory Latency : Energy-Efficient Pipeline and Memory System Techniques
title_full	Hiding and Reducing Memory Latency : Energy-Efficient Pipeline and Memory System Techniques
title_fullStr	Hiding and Reducing Memory Latency : Energy-Efficient Pipeline and Memory System Techniques
title_full_unstemmed	Hiding and Reducing Memory Latency : Energy-Efficient Pipeline and Memory System Techniques
title_sort	hiding and reducing memory latency : energy-efficient pipeline and memory system techniques
publisher	Uppsala universitet, Avdelningen för datorteknik
publishDate	2016
url	http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-306369 http://nbn-resolving.de/urn:isbn:978-91-554-9744-6
work_keys_str_mv	AT sembrantandreas hidingandreducingmemorylatencyenergyefficientpipelineandmemorysystemtechniques
_version_	1718398541929381888

Hiding and Reducing Memory Latency : Energy-Efficient Pipeline and Memory System Techniques

Similar Items