Summary: | 碩士 === 國立交通大學 === 資訊工程系 === 88 === The Study of Multilevel Branch Prediction
Student: Gi-Dung Liang Advisor: Dr. Chang-Jiu Chen
Department of Computer Science and Information Engineering
National Chiao Tung University
ABSTRACT
Branch instructions are always the performance bottleneck of modern pipelined superscalar processors for their interrupting the steady flow of instruction stream in the pipeline. To resolve the problem, various branch prediction schemes have been proposed. There are three branch prediction schemes widely used today. The simplest one is bimod predictor using 2-bit saturating counters to record the history outcomes of every branch instruction. The 2-level adaptive predictor uses two-level architecture to trace the correlation of nearby branch outcomes. The most complex is the combination predictor, which consists of the bimod and 2-level predictor and uses a meta-table to choose which result to use. Furthermore, it has become necessary to look further ahead in the instruction stream than a single branch for data and instruction prefetching. This approach obviously increases ILP due to the use of trace processors and decoupled-access DRAM. In order for these techniques to be effective they need to have a sufficient lookahead, i.e. to be far enough ahead of processor execution in requesting data.
In this thesis, we will propose several multi-level branch prediction mechanisms. In Mubp-Like (Multilevel branch predictor-Like) with not taken BTB (branch target buffer), it uses the last prediction target as the index of the not taken BTB to reduce the predictor size of not taken BTB. In Mubp-Like with taken BTB, it uses the last prediction path as the index of the taken BTB to reduce the predictor size of taken BTB. In Mubp-Like with RIP (reduce interference predictor), we use the auxiliary mechanism, RIP, to reduce the interference of the predictor table due to the loop instructions.
We simulate our design using the SimpleScalar tool set. We compare our schemes with the original Mubp scheme proposed by A. Veidenbaum on some of the SPEC95 benchmarks. The simulation result shows that the Mubp-Like with not taken BTB achieves higher accuracy and reduces 30 % hardware cost. In Mubp-Like with taken BTB, it approximately achieves the same accuracy and reduces 60% hardware cost. In Mubp-Like with RIP, the improvement of accuracy is 1% to 2%.
|