Summary: | Recent advances in deep learning, especially deep convolutional neural networks, have led to great performance improvement over semantic segmentation systems. Unfortunately, training deep neural networks (DNNs) requires a humongous amount of labeled data, which is laborious and costly to collect and annotate. Thus, plenty of works have proposed an alternative solution to ease the training set creation by using synthetic data. However, models trained on these kinds of data usually under-perform on real images due to the well-known issue of domain shift. To address it, we propose a generative adversarial network (GAN)-based framework called category-level adversarial adaptation networks (CAA-Nets) for domain adaptation in the context of semantic segmentation. Considering semantic predictions that contain spatial and structure information of images, our idea is to make use of this character by imposing discriminators on the semantic predictions. Different from existing works, the proposed framework utilizes a category-level discriminator in the output space to shrink the gap between real and synthetic images. Similar to reinforcement learning, we take final results as a guide to update parameters in the right direction by use of the output-based discriminator. Moreover, to further enhance the performance, we construct an image-based generator and discriminator pair to distill the feature representations obtained by a DNN. Taking advantage of these modules, our model can achieve competitive performance compared with some existing methods. To showcase the generality and scalability of our approach, we evaluate the proposed method on the Cityscapes dataset by adapting from GTAV and SYNTHIA datasets, where the results demonstrate the effectiveness of our method.
|