Crystal: an extensible framework for profiling and optimizing large scale system software

碩士 === 國立清華大學 === 資訊工程學系所 === 105 === In recent years, system software has become larger and more complex. There are several challenges to integrate profile data collection, analysis algorithm implementation, and optimal optimization for large scale program. Therefore, we present an extensible frame...

Full description

Bibliographic Details
Main Authors: Tai, Hung-Ying, 戴宏穎
Other Authors: Lee, Che-Rung
Format: Others
Language:en_US
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/dj25em
Description
Summary:碩士 === 國立清華大學 === 資訊工程學系所 === 105 === In recent years, system software has become larger and more complex. There are several challenges to integrate profile data collection, analysis algorithm implementation, and optimal optimization for large scale program. Therefore, we present an extensible framework called Crystal, which is designed for wrapping multiple profilers, compilers, and analyzers to gather useful data including static and dynamic function call graph, function counts, CPU cycles, cache miss rate, etc. Besides, we combine these data into a weighted function call graph. With this framework, user can simply apply analysis and optimization algorithm to program with les worry about the complexity of profile data generated by various tools. In this paper, we focus on framework design for wrapping various output formats from different tools. Our framework provides compiler unit for translating source code to IR, optimizer unit for applying profile-guided analysis results on IR, profiler unit for gathering accurate data, analyzer unit for determining optimization strategies and datastore unit for storing archived data. Finally, we have an application example on our framework. We propose a methodology to change code layout replacement in two steps. First, build weighted function call graph based on profiling data and we apply a community detection algorithm called Louvain method to split entire graph into multiple partitions. Second, we apply classic Pettis-Hansen algorithm in each partition to reordering function. In our evaluation, choosing LLVM as target and design 20 test cases for measuring the performance. In consequence, it increases from 4.5% to 12.4% performance improvement better than O0.