Summary: | In this paper, we address the problem of weakly supervised object localization using region weighting. For a weakly labelled image/video, the inside regions have different relevance to its semantic label. We first over-segment an image/video to get super-pixel/voxel regions, and assign each region with a latent weight to represent its support to the semantic label, then regress the weights to right values by optimizing the classification according to the weak labels. We adopt logistic regression as our base model due to its good performance in multiple-instance setting. The latent region weights are incorporated into the objective function as an interpretation of region combination at feature-level. The weights and the model parameters are optimized in an alternate manner. With the updates of the weights, the model is trained on the semantic regions independently of the background, therefore the learned model is capable of distinguishing object and non-object regions, and generating irregular-shape object localization. The method overcomes the limitations of applying multiple-instance learning to visual object localization. Experimental results on three datasets validates the effectiveness of the proposed method.
|