Self-Supervised Representation Learning for Content Based Image Retrieval

Automotive technologies and fully autonomous driving have seen a tremendous growth in recent times and have benefitted from extensive deep learning research. State-of-the-art deep learning methods are largely supervised and require labelled data for training. However, the annotation process for imag...

Full description

Bibliographic Details
Main Author: Govindarajan, Hariprasath
Format: Others
Language:English
Published: Linköpings universitet, Statistik och maskininlärning 2020
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166223
id ndltd-UPSALLA1-oai-DiVA.org-liu-166223
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-liu-1662232020-06-24T03:32:38ZSelf-Supervised Representation Learning for Content Based Image RetrievalengGovindarajan, HariprasathLinköpings universitet, Statistik och maskininlärning2020Content Based Image RetrievalCBIRRepresentation LearningSelf Supervised LearningUnsupervised LearningAttention MechanismNoise Contrastive EstimationAutonomous DrivingComputer and Information SciencesData- och informationsvetenskapProbability Theory and StatisticsSannolikhetsteori och statistikAutomotive technologies and fully autonomous driving have seen a tremendous growth in recent times and have benefitted from extensive deep learning research. State-of-the-art deep learning methods are largely supervised and require labelled data for training. However, the annotation process for image data is time-consuming and costly in terms of human efforts. It is of interest to find informative samples for labelling by Content Based Image Retrieval (CBIR). Generally, a CBIR method takes a query image as input and returns a set of images that are semantically similar to the query image. The image retrieval is achieved by transforming images to feature representations in a latent space, where it is possible to reason about image similarity in terms of image content. In this thesis, a self-supervised method is developed to learn feature representations of road scenes images. The self-supervised method learns feature representations for images by adapting intermediate convolutional features from an existing deep Convolutional Neural Network (CNN). A contrastive approach based on Noise Contrastive Estimation (NCE) is used to train the feature learning model. For complex images like road scenes where mutiple image aspects can occur simultaneously, it is important to embed all the salient image aspects in the feature representation. To achieve this, the output feature representation is obtained as an ensemble of feature embeddings which are learned by focusing on different image aspects. An attention mechanism is incorporated to encourage each ensemble member to focus on different image aspects. For comparison, a self-supervised model without attention is considered and a simple dimensionality reduction approach using SVD is treated as the baseline. The methods are evaluated on nine different evaluation datasets using CBIR performance metrics. The datasets correspond to different image aspects and concern the images at different spatial levels - global, semi-global and local. The feature representations learned by self-supervised methods are shown to perform better than the SVD approach. Taking into account that no labelled data is required for training, learning representations for road scenes images using self-supervised methods appear to be a promising direction. Usage of multiple query images to emphasize a query intention is investigated and a clear improvement in CBIR performance is observed. It is inconclusive whether the addition of an attentive mechanism impacts CBIR performance. The attention method shows some positive signs based on qualitative analysis and also performs better than other methods for one of the evaluation datasets containing a local aspect. This method for learning feature representations is promising but requires further research involving more diverse and complex image aspects. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166223application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic Content Based Image Retrieval
CBIR
Representation Learning
Self Supervised Learning
Unsupervised Learning
Attention Mechanism
Noise Contrastive Estimation
Autonomous Driving
Computer and Information Sciences
Data- och informationsvetenskap
Probability Theory and Statistics
Sannolikhetsteori och statistik
spellingShingle Content Based Image Retrieval
CBIR
Representation Learning
Self Supervised Learning
Unsupervised Learning
Attention Mechanism
Noise Contrastive Estimation
Autonomous Driving
Computer and Information Sciences
Data- och informationsvetenskap
Probability Theory and Statistics
Sannolikhetsteori och statistik
Govindarajan, Hariprasath
Self-Supervised Representation Learning for Content Based Image Retrieval
description Automotive technologies and fully autonomous driving have seen a tremendous growth in recent times and have benefitted from extensive deep learning research. State-of-the-art deep learning methods are largely supervised and require labelled data for training. However, the annotation process for image data is time-consuming and costly in terms of human efforts. It is of interest to find informative samples for labelling by Content Based Image Retrieval (CBIR). Generally, a CBIR method takes a query image as input and returns a set of images that are semantically similar to the query image. The image retrieval is achieved by transforming images to feature representations in a latent space, where it is possible to reason about image similarity in terms of image content. In this thesis, a self-supervised method is developed to learn feature representations of road scenes images. The self-supervised method learns feature representations for images by adapting intermediate convolutional features from an existing deep Convolutional Neural Network (CNN). A contrastive approach based on Noise Contrastive Estimation (NCE) is used to train the feature learning model. For complex images like road scenes where mutiple image aspects can occur simultaneously, it is important to embed all the salient image aspects in the feature representation. To achieve this, the output feature representation is obtained as an ensemble of feature embeddings which are learned by focusing on different image aspects. An attention mechanism is incorporated to encourage each ensemble member to focus on different image aspects. For comparison, a self-supervised model without attention is considered and a simple dimensionality reduction approach using SVD is treated as the baseline. The methods are evaluated on nine different evaluation datasets using CBIR performance metrics. The datasets correspond to different image aspects and concern the images at different spatial levels - global, semi-global and local. The feature representations learned by self-supervised methods are shown to perform better than the SVD approach. Taking into account that no labelled data is required for training, learning representations for road scenes images using self-supervised methods appear to be a promising direction. Usage of multiple query images to emphasize a query intention is investigated and a clear improvement in CBIR performance is observed. It is inconclusive whether the addition of an attentive mechanism impacts CBIR performance. The attention method shows some positive signs based on qualitative analysis and also performs better than other methods for one of the evaluation datasets containing a local aspect. This method for learning feature representations is promising but requires further research involving more diverse and complex image aspects.
author Govindarajan, Hariprasath
author_facet Govindarajan, Hariprasath
author_sort Govindarajan, Hariprasath
title Self-Supervised Representation Learning for Content Based Image Retrieval
title_short Self-Supervised Representation Learning for Content Based Image Retrieval
title_full Self-Supervised Representation Learning for Content Based Image Retrieval
title_fullStr Self-Supervised Representation Learning for Content Based Image Retrieval
title_full_unstemmed Self-Supervised Representation Learning for Content Based Image Retrieval
title_sort self-supervised representation learning for content based image retrieval
publisher Linköpings universitet, Statistik och maskininlärning
publishDate 2020
url http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166223
work_keys_str_mv AT govindarajanhariprasath selfsupervisedrepresentationlearningforcontentbasedimageretrieval
_version_ 1719323560088961024