MLSL: Multi-Level Self-Supervised Learning for Domain Adaptation with Spatially Independent and Semantically Consistent Labeling
This work is published at the Winter Conference on Applications of Computer Vision (WACV) 2020 [Paper]. This blog introduces the idea and working principle of the proposed approach. Follow GitHub to reproduce the results.
Objective:
The aim of this work is to develop a self-supervised domain adaptation approach for semantic segmentation. Training a semantic segmentation model requires pixel-level labels, very laborious and time-consuming. Secondly, adversarial approaches used to perform this task are unstable with some serious limitations. So, this work proposes a domain adaptation approach based on the pseudo-labels generation and self-supervision approach.
Idea and Approach:
This work leverages the idea that an object and most of the stuff given enough context, should be labeled consistently regardless of its location.

Hence, we generate spatially independent and semantically consistent (SISC) pseudo-labels by segmenting multiple sub-images using the baseline model and designing an aggregation strategy. Secondly, image-level pseudo-weak-labels, PWL, are computed to guide domain adaptation by capturing global context similarity in source and domain at latent space level. Thus helping latent space to learn the representation even when there are very few pixels belonging to the domain category (small objects for example) compared to the rest of the image. The complete work-flow is described in Fig.2.

Using these generated pixel level and image level pseudo-labels, we adapt the target domain. This process of generating pseudo-labels and re-training the segmentation network is repeated iteratively for a certain number of rounds.
MLSL presents state-of-the-art performance compared to existing self-supervised domain adaptation approaches. Please have a look at the paper and visit our project page for a more detailed discussion on the approach and results. Clone the GitHub and follow the instructions to reproduce the results.