Summary: | 碩士 === 國立交通大學 === 多媒體工程研究所 === 106 === The video coding community has long been seeking more effective rate-distortion optimization techniques than the widely adopted greedy approach. The difficulty arises when we need to predict how the coding mode decision made in one stage would affect subsequent decisions and thus the overall coding performance.
Reinforcement learning lends itself to such dependent decision making problems. We introduce in this thesis reinforcement learning as a mechanism for the coding unit split decision and the intra-frame rate control in HEVC/H.265.
For the coding unit split decision, the task is to determine the splitting of a coding unit without the full rate-distortion optimization search adopted by the current HEVC/H.265 committee software. We formulate the coding unit split decision as a reinforcement learning problem by regarding the luminance samples of a coding unit together with the quantization parameter as its state, the split decision as an action, and the reduction in rate-distortion cost relative to keeping the current coding unit intact as the immediate reward. We learn convolutional neural networks based on Q-learning to approximate the rate-distortion cost reduction of each possible state-action pair. The proposed scheme performs comparably with the current full rate-distortion optimization in HM-16.15, incurring a 2.5% average BD-rate loss. While also performing similarly to the conventional binary classification scheme, our scheme can additionally quantify the rate-distortion cost reduction, enabling more applications.
For the intra-frame rate control, the task is to determine a quantization parameter value for every coding tree unit in a frame, with the objective being to minimize the frame-level distortion subject to a rate constraint. We draw an analogy between the rate control problem and the reinforcement learning problem, by considering the texture complexity of coding tree units and bit balance as the environment state, the quantization parameter value as an action that an agent needs to take, and the negative distortion of the coding tree unit as an immediate reward. We train a neural network based on Q-learning to be our agent, which observes the state to evaluate the reward for each possible action. When trained on only limited sequences, the proposed model can already perform comparably with the rate control algorithm in HM-16.15.
|