Summary: | 博士 === 國立清華大學 === 資訊工程學系 === 103 === Abstract
Digital signal processors (DSPs) with very long instruction word (VLIW) data-path architectures are increasingly being deployed on embedded devices for multimedia processing applications. While developing new VLIW DSP processors, engineers always take complexity, die size, and power dissipation into consideration. Therefore, some popular and traditional designs may not be feasible for embedded systems. Instead, distributed register files and multi-bank register architectures are being adopted to eliminate the amount of read/write ports associated with register files. Although such wide varieties of register file architectures and irregular designs achieve high performance and low power consumption criterion, they present challenges for devising compiler optimization schemes as well.
Compiler optimizations, which direct code generation more efficiency, can be conceptually classified into local and global optimizations. Local optimizations only take place within small scope of code fragment, hence the impact of irregular designs is trivial. On the contrary, global optimizations usually go through entire procedure and try to utilize resources as effectively as possible, so the irregular designs and distributed scenarios make global optimizations difficult to have expected improvement.
This dissertation has made contributions to the development and discussion of global optimizations on compilers for a novel VLIW DSP with distributed register files. The target DSP architecture, known as PAC DSP core, is designed with distinctively banked register files with highly restricted port access. Our experiences of developing global optimizations in compilers for the PAC DSP may also be of interest to those involved in developing compilers for the similar architectures.
Experiments were also performed on the PAC VLIW DSP with distributed register files by incorporating our proposed optimization schemes into an Open64-based compiler. Several benchmarks such as EEMBC and MiBench were tested for evaluating the improvement of utilizing the features of the specific register file architectures. It shows that a VLIW DSP compiler applied by our global optimization schemes exhibits performance superior to traditional strategies.
|