Automatic initialization for broadcast sports videos rectification

Broadcast sport videos can be captured by a static or a moving camera. Unfortunately, the problem with a moving camera is that planar projective transformations (i.e., the homographies) have to be computed for each image frame in a video sequence in order to compensate for camera motions and viewpoi...

Full description

Bibliographic Details
Main Author: Mohammadi Tari, Shervin
Language:English
Published: University of British Columbia 2012
Online Access:http://hdl.handle.net/2429/39969
Description
Summary:Broadcast sport videos can be captured by a static or a moving camera. Unfortunately, the problem with a moving camera is that planar projective transformations (i.e., the homographies) have to be computed for each image frame in a video sequence in order to compensate for camera motions and viewpoint changes. Recently, a variety of methods have been proposed to estimate the homography between two images based on various correspondences (e.g., points, lines, ellipses matchings, and their combinations). Since the frame to frame homography estimation is an iterative process, it needs an initial estimate. Moreover, the initial estimate has to be accurate enough to guarantee that the method is going to converge to an optimal estimate. Although the initialization can be done manually for a couple of frames, manual initialization is not feasible where we are dealing with thousands of images within an entire sports game. Thus, automatic initialization is an important part of the automatic homography estimation process. In this dissertation we aim to address the problem of automatic initialization for homography estimation. More precisely, this thesis comprises four key modules, namely preprocessing, keyframe selection, keyframe matching, and frame-to-frame homography estimation, that work together in order to automatically initialize any homography estimation method that can be used for broadcast sports videos. The first part removes blurry images and roughly estimates the game-field area within remaining salient images and represents them as a set of binary masks. Then, those resulting binary masks are fed into the keyframe selection module in order to select a set of representative frames by using a robust dimensionality reduction method together with a clustering algorithm. The third module finds the closest keyframe to each input frame by taking advantage of three classifiers together with an artificial neural network to combine their results and improve the overall accuracy of the matching process. The last module takes the input frames, their corresponding closest keyframes, and computes the model-to-frame homography for all input frames. Finally, we evaluate the accuracy and robustness of our proposed method on one hockey and two basketball datasets.