Reconfigurable Cache Memory Mechanism for Integral Image andIntegral Histogram Applications

碩士 === 國立臺灣大學 === 電子工程學研究所 === 100 === With the development of semiconductor technology, the capability of micro-processor doubles almost every 18 months, by the Moore''s Law. However, the speed of off-chip DRAM grows only 7% every year. There is a huge gap between the speed of CPU...

Full description

Bibliographic Details
Main Authors: Po-Hao Hsu, 許博豪
Other Authors: Shao-Yi Chien
Format: Others
Language:zh-TW
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/58964530730016983247
Description
Summary:碩士 === 國立臺灣大學 === 電子工程學研究所 === 100 === With the development of semiconductor technology, the capability of micro-processor doubles almost every 18 months, by the Moore''s Law. However, the speed of off-chip DRAM grows only 7% every year. There is a huge gap between the speed of CPU and DRAM. Cache memory is a high speed memory which can reduce the memory access latency between the processor and off-chip DRAM, and it usually occupies a large area of the whole system.However, for some operation with integral images and integral integral histograms, which are famous for getting an arbitrary-sized block summation and histogram in a constant speed and widely implemented in many applications, the read and write mechanisms of cache are not suitable for such algorithms with stream processing characteristic. With larger cache size, the cycle count can be further reduced. However, the analysis results show that there is a bottleneck of cache hit rate and the cycle count reduction meets a limitation. For the above reasons, in this thesis, a reconfigurable cache memory mechanism is proposed to support both general data access and stream processing. This proposed memory has two modes: normal cache mode and Row-Based Stream Processing (RBSP) mode, which is a specific mechanism for data accessing of integral images and integral histograms. The RBSP mode can reduce the cycle count because all the subsequent necessary data has been precisely prefetched with the basic accessing unit of an image row. Two integral image and integral histogram applications, SURF algorithm and center-surround histogram of salience map, are implemented to verify the proposed mechanism. Moreover, the data reuse scheme intra-filter-size sharing and inter-filter-size sharing between different filter sizes and diffierent filtering stripes are taken into consideration to further reduce the data access to the off-chip DRAM. A mapping algorithm is proposed to help the RBSP memory read and write the data, which is implemented in hardware and software versions. In addition, a method called Memory Dividing Technique (MDT) is also proposed to further reduce the word-length. The whole system is built in the Coware Platform Architect to verify our design. Our target image size is VGA 640 x 480 and the experimental results show that the proposed Reconfigurable RBSP memory can save 38.31% and 48.29% memory cycle count for these two applications compared to the traditional data cache in the same level of size. The hardware is implemented with Verilog-HDL and synthesized with Synopsys Design Compiler in TSMC 180nm technology. The total gate count of RBSP memory is 557.0K. The overhead of our proposed RBSP memory is very small, just 7.61% or 5.28% with hardware or software based implementation compared to the set associative cache.