Summary: | 碩士 === 國立交通大學 === 電機資訊國際學程 === 103 === Scene composition is a method widely used in movie and TV production. Merging two sets of 3D videos into one is a very challenging task. The original two video sequences are often taken by different cameras at different locations. For example, these two cameras may have different movement, and their orientations (poses) can also be different. Our focus is compositing two sets of RGB-D videos with different camera orientations and motion parameters. The key techniques are camera motion estimation, camera orientation estimation and the view synthesis technique used to produce the synthesized motion-compensated and/or orientation-compensated background video. The depth-assisted view synthesis technique is a key component in this process and has a strong impact on the final video quality. We thus propose a refined backward warping technique for view synthesis and adopt the ICP algorithm and 1-D floor model to calculate the camera motion/orientation parameters.
In this thesis, we use the popular superpixels notion to refine the previously proposed backward depth warping method. There are many superpixels generation techniques and one of the popular and effective technique is the so-called SLIC (Simple Linear Iterative Clustering) superpixels. We use the SLIC technique with some modifications to match our needs. The superpixels technique is used to deal with non-occlussion holes problem that occurs in backward depth warping.
We adopt the ICP (Iterative Closest Points) algorithm to estimate the camera motion parameters. ICP is a popular technique in computer graphics, which is commonly used to construct 3D models. ICP takes in two sets of 3D-point clouds as inputs and iteratively calculates the transformation matrix between these two sets. In our application, we generate the 3D point sets from two consecutive frames of a video sequence. In our experiment, we calculate the translation vector between two frames (two camera locations). We also propose an inter-frame camera motion estimation technique to reduce the computing time. Furthermore, we adjust the background camera orientation (pose) to match that of the foreground camera. We combine all the above techniques together to synthesize a good quality nature-look virtual view even when the original two cameras were mismatches in orientation and in motion.
|