Summary: | Emerging multimedia applications and Services require efficient and flexible coding (MPEG-4) and description (MPEG-7) of visual information. Object-based representations of visual information obtained by scene segmentation are particularly well-suited to this purpose. In this work, the segmentation of video sequences is addressed using a combination of features, such as motion, texture and colour. First, the Recursive Shortest Spanning Tree (RSST) is considered as a baseline segmentation tool and is adapted to perform single-feature segmentation using different visual cues. A novel motion- based RSST segmentation algorithm that incorporates multiple motion features into a single cost function is presented. Effective texture segmentation is achieved by a novel scheme relying on mathematical morphology operators. This approach is further extended to become applicable to colour texture segmentation. Second, multiple-feature segmentation of video sequences emerges as a major focus of this work. The RSST has been employed in order to perform simultaneous multiple-feature segmentation of video sequences in a hierarchical fashion. The presented work demonstrates that the performance of this approach rapidly degrades as the dimensionality of the feature space increases. To overcome this problem, a novel two-stage architecture for object-based segmentation is presented. The first stage locates perceptually meaningful objects using a hierarchy of single-feature segmentation processes. The second stage refines the boundaries of located objects using a suitable combination of features and a set of appropriate rules. This model is further simplified by minimizing the number of required sequence-dependent parameters and also by minimizing the number of inputs to the rule-based part of the algorithm. A comparative evaluation with state-of-the-art competing algorithms is favourable, demonstrating that the proposed architecture is capable of achieving accurate, meaningful and consistent segmentations which are intuitively correct and have good correspondence with a human viewer's notion of the decomposition of a natural scene to its constituent objects.
|