Efficient and Flexible Characterization of Data Locality through Native Execution Sampling

Data locality is central to modern computer designs. The widening gap between processor speed and memory latency has introduced the need for a deep hierarchy of caches. Thus, the performance of an application is to a large extent dependent on the amount of data locality the caches can exploit. Some...

Full description

Bibliographic Details
Main Author: Berg, Erik
Format: Doctoral Thesis
Language:English
Published: Uppsala universitet, Avdelningen för datorteknik 2005
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-6012
http://nbn-resolving.de/urn:isbn:91-554-6363-0
Description
Summary:Data locality is central to modern computer designs. The widening gap between processor speed and memory latency has introduced the need for a deep hierarchy of caches. Thus, the performance of an application is to a large extent dependent on the amount of data locality the caches can exploit. Some data locality comes naturally from the way most programs are written and the way their data is allocated in the memory. Compilers further try to create data locality by loop transformations and optimized data layout. Different ways of writing a program and/or laying out its data may improve an application’s locality even more. However, it is far from obvious how such a locality optimization can be achieved, especially since the optimizing compiler may have left the optimization job half done. Thus, efficient tools are needed to guide the software developers on their quest for data locality. The main contribution of this dissertation is a sample-based novel method for analyzing the data locality of an application. Very sparse data is collected during a single execution of the studied application. The sparse sampling adds a minimum overhead to the execution time, which enables complex applications running realistic data sets to be studied. The architecturalindependent information collected during the execution is fed to a mathematical cache model for predicting the cache miss ratio. The sparsely-collected data can be used to characterize the application’s data locality in respect to almost any possible cache hierarchy, such as complicated multiprocessor memory systems with multilevel cache hierarchies. Any combination of cache size, cache line size and degree of sharing can be modeled. Each new modeled design point takes only a fraction of a second to evaluate, even though the application from which the sampled data was collected may have executed for hours. This makes the tool not just usable for software developers, but also for hardware developers who need to evaluate a huge memory-system design space. We also discuss different ways of presenting data-locality information to a programmer in an intuitive and easily interpreted way. Some of the locality metrics we introduce utilize the flexibility of our algorithm and its ability to vary different cache parameters for one run. The dissertation also presents several prototype implementations of tools for profiling the memory system.