Summary: | 碩士 === 國立雲林科技大學 === 電子工程系 === 104 === Next-generation video compression technology (High Efficiency Video Coding, HEVC) presents many new encoding techniques, such as coding unit (CU) of recursive coding tree structure, more predictive methods, introducing larger prediction unit (PU) and transform unit (TU). HEVC improves coding efficiency greatly; however, it also induces extremely high challenges in implementing real-time encoding and decoding systems. Meanwhile, HEVC is expected to compress much higher resolution videos than before so accelerators for key modules are required. Compared to transform coding of H.264, HEVC supports not only variable but also larger block sizes, i.e., four kinds of block sizes including 4, 8, 16, and 32. Such transform coding requires an accelerator that possesses features of high data throughput rate and flexible architecture at the same time. Therefore, this paper proposes a design of two-dimensional high-performance direct 2-D transformation coding architecture to satisfy both the variable length processing demand and real-time performance requirement of HEVC.
Current HEVC transform coding designs process the first dimensional transformation, e.g., column data, and then process the second dimensional transformation, e.g., row data, to achieve 2-D computation. However, a dedicated transpose memory must be equipped in such kind of design for temporary storing and transposing the intermediate data. In addition, the size of transpose memory increases in a quadratic manner as the block size increases, which induces cost in terms of chip area and power consumption. Besides, this part of circuit has no direct contribution to computation power. For this reason, we expect to transform part of such cost into direct operational performance by proposing a direct 2-D transform coding algorithm and its corresponding architecture which are the main contributions of this work. We apply the design concept of processor architecture to realize the first dimensional transform coding by utilizing the high flexibility of the architecture for calculating variable lengths transform coding. In such way, different firmware libraries which support executing 4, 8, 16, and 32-point transform coding are developed and equipped in the system. For the second dimensional transformation, we propose an efficient Multiple Constant Multiplication (MCM) unit design in which multiplications are replace by add-and-shift operations to support operations of different points and achieve high performance simultaneously. The proposed direct 2-D algorithm suitably arranges the data processing sequences adopted in row and column transforms of HEVC CODEC systems to finish the data transposition on-the-fly. Using a 90-nm CMOS technology, the optimum operating clock frequency of the proposed multi-length transform design achieves 285 MHz which can deliver 1587 Mpixels/s data throughput rate with the cost of 379k gates. This performance can achieve the real-time transform processing of Ultra-High-Definition 4kx2k (3840x2160 @ 30 fps). When the data throughput rate per unit area is adopted as the comparison index in hardware efficiency, the proposed design is at least 1.94 times more efficient than the existing designs. Moreover, the proposed multi-length transform design can achieve HDTV 720p, 1080i, digital cinema video processing requirements by consuming only 90.6 mW when operated at 285 MHz with 1V power supplies.
|