Self-Supervised Representation Learning for Content Based Image Retrieval

Automotive technologies and fully autonomous driving have seen a tremendous growth in recent times and have benefitted from extensive deep learning research. State-of-the-art deep learning methods are largely supervised and require labelled data for training. However, the annotation process for imag...

Full description

Bibliographic Details
Main Author:	Govindarajan, Hariprasath
Format:	Others
Language:	English
Published:	Linköpings universitet, Statistik och maskininlärning 2020
Subjects:	Content Based Image Retrieval CBIR Representation Learning Self Supervised Learning Unsupervised Learning Attention Mechanism Noise Contrastive Estimation Autonomous Driving Computer and Information Sciences Data- och informationsvetenskap Probability Theory and Statistics Sannolikhetsteori och statistik
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166223

id	ndltd-UPSALLA1-oai-DiVA.org-liu-166223
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-liu-1662232020-06-24T03:32:38ZSelf-Supervised Representation Learning for Content Based Image RetrievalengGovindarajan, HariprasathLinköpings universitet, Statistik och maskininlärning2020Content Based Image RetrievalCBIRRepresentation LearningSelf Supervised LearningUnsupervised LearningAttention MechanismNoise Contrastive EstimationAutonomous DrivingComputer and Information SciencesData- och informationsvetenskapProbability Theory and StatisticsSannolikhetsteori och statistikAutomotive technologies and fully autonomous driving have seen a tremendous growth in recent times and have benefitted from extensive deep learning research. State-of-the-art deep learning methods are largely supervised and require labelled data for training. However, the annotation process for image data is time-consuming and costly in terms of human efforts. It is of interest to find informative samples for labelling by Content Based Image Retrieval (CBIR). Generally, a CBIR method takes a query image as input and returns a set of images that are semantically similar to the query image. The image retrieval is achieved by transforming images to feature representations in a latent space, where it is possible to reason about image similarity in terms of image content. In this thesis, a self-supervised method is developed to learn feature representations of road scenes images. The self-supervised method learns feature representations for images by adapting intermediate convolutional features from an existing deep Convolutional Neural Network (CNN). A contrastive approach based on Noise Contrastive Estimation (NCE) is used to train the feature learning model. For complex images like road scenes where mutiple image aspects can occur simultaneously, it is important to embed all the salient image aspects in the feature representation. To achieve this, the output feature representation is obtained as an ensemble of feature embeddings which are learned by focusing on different image aspects. An attention mechanism is incorporated to encourage each ensemble member to focus on different image aspects. For comparison, a self-supervised model without attention is considered and a simple dimensionality reduction approach using SVD is treated as the baseline. The methods are evaluated on nine different evaluation datasets using CBIR performance metrics. The datasets correspond to different image aspects and concern the images at different spatial levels - global, semi-global and local. The feature representations learned by self-supervised methods are shown to perform better than the SVD approach. Taking into account that no labelled data is required for training, learning representations for road scenes images using self-supervised methods appear to be a promising direction. Usage of multiple query images to emphasize a query intention is investigated and a clear improvement in CBIR performance is observed. It is inconclusive whether the addition of an attentive mechanism impacts CBIR performance. The attention method shows some positive signs based on qualitative analysis and also performs better than other methods for one of the evaluation datasets containing a local aspect. This method for learning feature representations is promising but requires further research involving more diverse and complex image aspects. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166223application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Content Based Image Retrieval CBIR Representation Learning Self Supervised Learning Unsupervised Learning Attention Mechanism Noise Contrastive Estimation Autonomous Driving Computer and Information Sciences Data- och informationsvetenskap Probability Theory and Statistics Sannolikhetsteori och statistik
spellingShingle	Content Based Image Retrieval CBIR Representation Learning Self Supervised Learning Unsupervised Learning Attention Mechanism Noise Contrastive Estimation Autonomous Driving Computer and Information Sciences Data- och informationsvetenskap Probability Theory and Statistics Sannolikhetsteori och statistik Govindarajan, Hariprasath Self-Supervised Representation Learning for Content Based Image Retrieval
description	Automotive technologies and fully autonomous driving have seen a tremendous growth in recent times and have benefitted from extensive deep learning research. State-of-the-art deep learning methods are largely supervised and require labelled data for training. However, the annotation process for image data is time-consuming and costly in terms of human efforts. It is of interest to find informative samples for labelling by Content Based Image Retrieval (CBIR). Generally, a CBIR method takes a query image as input and returns a set of images that are semantically similar to the query image. The image retrieval is achieved by transforming images to feature representations in a latent space, where it is possible to reason about image similarity in terms of image content. In this thesis, a self-supervised method is developed to learn feature representations of road scenes images. The self-supervised method learns feature representations for images by adapting intermediate convolutional features from an existing deep Convolutional Neural Network (CNN). A contrastive approach based on Noise Contrastive Estimation (NCE) is used to train the feature learning model. For complex images like road scenes where mutiple image aspects can occur simultaneously, it is important to embed all the salient image aspects in the feature representation. To achieve this, the output feature representation is obtained as an ensemble of feature embeddings which are learned by focusing on different image aspects. An attention mechanism is incorporated to encourage each ensemble member to focus on different image aspects. For comparison, a self-supervised model without attention is considered and a simple dimensionality reduction approach using SVD is treated as the baseline. The methods are evaluated on nine different evaluation datasets using CBIR performance metrics. The datasets correspond to different image aspects and concern the images at different spatial levels - global, semi-global and local. The feature representations learned by self-supervised methods are shown to perform better than the SVD approach. Taking into account that no labelled data is required for training, learning representations for road scenes images using self-supervised methods appear to be a promising direction. Usage of multiple query images to emphasize a query intention is investigated and a clear improvement in CBIR performance is observed. It is inconclusive whether the addition of an attentive mechanism impacts CBIR performance. The attention method shows some positive signs based on qualitative analysis and also performs better than other methods for one of the evaluation datasets containing a local aspect. This method for learning feature representations is promising but requires further research involving more diverse and complex image aspects.
author	Govindarajan, Hariprasath
author_facet	Govindarajan, Hariprasath
author_sort	Govindarajan, Hariprasath
title	Self-Supervised Representation Learning for Content Based Image Retrieval
title_short	Self-Supervised Representation Learning for Content Based Image Retrieval
title_full	Self-Supervised Representation Learning for Content Based Image Retrieval
title_fullStr	Self-Supervised Representation Learning for Content Based Image Retrieval
title_full_unstemmed	Self-Supervised Representation Learning for Content Based Image Retrieval
title_sort	self-supervised representation learning for content based image retrieval
publisher	Linköpings universitet, Statistik och maskininlärning
publishDate	2020
url	http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166223
work_keys_str_mv	AT govindarajanhariprasath selfsupervisedrepresentationlearningforcontentbasedimageretrieval
_version_	1719323560088961024

Self-Supervised Representation Learning for Content Based Image Retrieval

Similar Items