Summary: | The recent flourish of deep learning in various visual learning tasks is largely credited to the rich and accessible labeled data. Nonetheless, massive label supervision remains a luxury for many real-world applications: It is costly and time-consuming to collect and annotate a large amount of training data. Sometimes it is even infeasible to get large training datasets because for certain tasks only a few or even no examples are available, or annotating requires expert knowledge. This dissertation studies the label scarce problems for various visual learning tasks and aims to develop algorithms of robust performance with limited labels provided. The scope of this dissertation falls into the following three lines. The first research line is learning to generalize from limited labels. This research strives to most effectively utilize the limited labels available. We develop two novel algorithms that enhance generalizability of learning models for new (unseen during model training) classes given limited label supervision. In the first algorithm, we combat label deficiency by performing data augmentation in the feature space. We propose a conditional generative adversarial network that synthesizes new features conditioned on the labeled ones. Two novel regularizers are proposed to encourage the generator to synthesize features of both desirable discriminability and diversity. In the second algorithm, we propose to learn a meta-learner network that directly generates task-specific networks from the attributes of given classes. As a customized network is generated from the attributes, desirable performance could be reached for classifying images from the novel classes. The second research line is learning to reuse labels from a relevant but different domain. This research copes label deficiency of the target domain by reusing labels from another (source) domain. The challenge is how to address the disparity of data distributions of the two domains. This research proposes two domain adaptation algorithms to handle this. The first algorithm targets for the object detection task. We perform domain alignment with adversarial learning in both pixel level and region proposal level. Besides, we extract rough segmentation maps for images from both domains, and use the rough segmentation maps to align the domains by learning a segmentation task applied in both domains. The second algorithm aligns the domains with consistency learning which optimizes the model to produce consistent class predictions for different augmented versions of the same image. By regularizing the model to make smooth class predictions on changes in the image space, label supervision can be readily transferred from the labeled source domain to the unlabeled target domain. The last research line is learning representations without labels. This research addresses label deficiency neither by exploiting auxiliary labels from other domains, nor by maximizing the utility of limited labels provided, but instead by mining the intrinsic relationship among unlabeled data samples as the supervision for model training. Under the assumption that visual data are usually structured and lie in different subspaces, this research proposes a deep multi-view subspace clustering algorithm which performs joint deep feature embedding and data affinity recovery by explicitly modeling subspace relationship among data points in the latent embedding space. We comprehensively explore the multi-view data provided and seek a consensus data affinity relationship not only compatible to all views but also to all intermediate embedding spaces. With more constraints being cast, true data affinity relationship is supposed to be more reliably recovered. In summary, this dissertation systematically investigates various techniques of addressing the label scarce problem for different visual learning tasks, including image classification, object detection, image clustering. Empirical study show that proposed techniques effectively alleviate this problem and desirable performance has been achieved for these tasks.--Author's abstract
|