Zero-shot image classification

Image classification is one of the essential tasks for the intelligent visual system. Conventional image classification techniques rely on a large number of labelled images for supervised learning, which requires expensive human annotations. Towards real intelligent systems, a more favourable way is...

Full description

Bibliographic Details
Main Author: Long, Yang
Other Authors: Shao, Ling ; Xiaoli, Chu
Published: University of Sheffield 2017
Subjects:
Online Access:https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.727304
id ndltd-bl.uk-oai-ethos.bl.uk-727304
record_format oai_dc
collection NDLTD
sources NDLTD
topic 621.3
spellingShingle 621.3
Long, Yang
Zero-shot image classification
description Image classification is one of the essential tasks for the intelligent visual system. Conventional image classification techniques rely on a large number of labelled images for supervised learning, which requires expensive human annotations. Towards real intelligent systems, a more favourable way is to teach the machine how to make classification using prior knowledge like humans. For example, a palaeontologist could recognise an extinct species purely based on the textual descriptions. To this end, Zero-Shot Image Classification (ZIC) is proposed, which aims to make machines that can learn to classify unseen images like humans. The problem can be viewed from two different levels. Low-level technical issues are concerned by the general Zero-shot Learning (ZSL) problem which considers how to train a classifier on the unseen visual domain using prior knowledge. High-level issues incorporate how to design and organise visual knowledge representation to construct a systematic ontology that could be an ultimate knowledge base for machines to learn. This thesis aims to provide a thorough study of the ZIC problem, regarding models, challenges, possible applications, etc. Besides, each main chapter demonstrates an innovative contribution that is creatively made during my study. The first is to solve the problem of Visual-Semantic Ambiguity. Namely, the same semantic concepts (e.g. attributes) can refer to a huge variety of visual features, and vice versa. Conventional ZSL methods usually adopt a one-way embedding that maps such high-variance visual features into the semantic space, which may lead to degraded performance. As a solution, a dual-graph regularised embedding algorithm named Visual-Semantic Ambiguity Removal (VSAR) is proposed, which can capture the intrinsic local structure of both visual and semantic spaces. In the intermediate embedding space, the structural difference is reconciled to remove the ambiguity. The second contribution aims to circumvent costly visual data collection for conventional supervised classification using ZSL techniques. The key idea is to synthesise visual features from the semantic information, just like humans can imagine features of an unseen class from the semantic description of prior knowledge. Hereafter, new objects from unseen classes can be classified in a conventional supervised framework using the inferred visual features. To overcome the correlation problem, we propose an intermediate Orthogonal Semantic-Visual Embedding (OSVE) space to remove the correlated redundancy. The proposed method achieves promising performance on fine-grained datasets. In the third contribution, the graph constraint of VSAR is incorporated to synthesise improved visual features. The orthogonal embedding is reconsidered as an Information Diffusion problem. Through an orthogonal rotation, the synthesised visual features become more discriminative. On four benchmarks, the proposed method demonstrates the advantages of synthesised visual features, which significantly outperforms state-of-the-art results. Since most of ZSL approaches highly rely on expensive attributes, the fourth contribution of this thesis explores a more feasible but more effective Semantic Simile model to describe unseen classes. From a group of similes, e.g. an unknown animal has the same parts of a wolf, and the colour looks like a bobcat, implicit attributes are discovered by a graph-cut algorithm. Comprehensive experimental results suggest the simile-based implicit attributes can significantly boost the performance. To maximumly reduce the cost of building ontologies for ZIC, the final chapter introduces a novel scheme, using which ZIC can be achieved by only a few similes of each unseen class. No annotations of seen classes are needed. Such an approach finally sets ZIC attribute-free, which significantly improve the feasibility of ZIC. Unseen classes can be recognised using a conventional setting without expensive attribute ontology. It can be concluded that the methods introduced in this thesis provide fundamental components of a zero-shot image classification system. The thesis also points out four core directions for future ZIC research.
author2 Shao, Ling ; Xiaoli, Chu
author_facet Shao, Ling ; Xiaoli, Chu
Long, Yang
author Long, Yang
author_sort Long, Yang
title Zero-shot image classification
title_short Zero-shot image classification
title_full Zero-shot image classification
title_fullStr Zero-shot image classification
title_full_unstemmed Zero-shot image classification
title_sort zero-shot image classification
publisher University of Sheffield
publishDate 2017
url https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.727304
work_keys_str_mv AT longyang zeroshotimageclassification
_version_ 1718995009897758720
spelling ndltd-bl.uk-oai-ethos.bl.uk-7273042019-03-05T15:39:31ZZero-shot image classificationLong, YangShao, Ling ; Xiaoli, Chu2017Image classification is one of the essential tasks for the intelligent visual system. Conventional image classification techniques rely on a large number of labelled images for supervised learning, which requires expensive human annotations. Towards real intelligent systems, a more favourable way is to teach the machine how to make classification using prior knowledge like humans. For example, a palaeontologist could recognise an extinct species purely based on the textual descriptions. To this end, Zero-Shot Image Classification (ZIC) is proposed, which aims to make machines that can learn to classify unseen images like humans. The problem can be viewed from two different levels. Low-level technical issues are concerned by the general Zero-shot Learning (ZSL) problem which considers how to train a classifier on the unseen visual domain using prior knowledge. High-level issues incorporate how to design and organise visual knowledge representation to construct a systematic ontology that could be an ultimate knowledge base for machines to learn. This thesis aims to provide a thorough study of the ZIC problem, regarding models, challenges, possible applications, etc. Besides, each main chapter demonstrates an innovative contribution that is creatively made during my study. The first is to solve the problem of Visual-Semantic Ambiguity. Namely, the same semantic concepts (e.g. attributes) can refer to a huge variety of visual features, and vice versa. Conventional ZSL methods usually adopt a one-way embedding that maps such high-variance visual features into the semantic space, which may lead to degraded performance. As a solution, a dual-graph regularised embedding algorithm named Visual-Semantic Ambiguity Removal (VSAR) is proposed, which can capture the intrinsic local structure of both visual and semantic spaces. In the intermediate embedding space, the structural difference is reconciled to remove the ambiguity. The second contribution aims to circumvent costly visual data collection for conventional supervised classification using ZSL techniques. The key idea is to synthesise visual features from the semantic information, just like humans can imagine features of an unseen class from the semantic description of prior knowledge. Hereafter, new objects from unseen classes can be classified in a conventional supervised framework using the inferred visual features. To overcome the correlation problem, we propose an intermediate Orthogonal Semantic-Visual Embedding (OSVE) space to remove the correlated redundancy. The proposed method achieves promising performance on fine-grained datasets. In the third contribution, the graph constraint of VSAR is incorporated to synthesise improved visual features. The orthogonal embedding is reconsidered as an Information Diffusion problem. Through an orthogonal rotation, the synthesised visual features become more discriminative. On four benchmarks, the proposed method demonstrates the advantages of synthesised visual features, which significantly outperforms state-of-the-art results. Since most of ZSL approaches highly rely on expensive attributes, the fourth contribution of this thesis explores a more feasible but more effective Semantic Simile model to describe unseen classes. From a group of similes, e.g. an unknown animal has the same parts of a wolf, and the colour looks like a bobcat, implicit attributes are discovered by a graph-cut algorithm. Comprehensive experimental results suggest the simile-based implicit attributes can significantly boost the performance. To maximumly reduce the cost of building ontologies for ZIC, the final chapter introduces a novel scheme, using which ZIC can be achieved by only a few similes of each unseen class. No annotations of seen classes are needed. Such an approach finally sets ZIC attribute-free, which significantly improve the feasibility of ZIC. Unseen classes can be recognised using a conventional setting without expensive attribute ontology. It can be concluded that the methods introduced in this thesis provide fundamental components of a zero-shot image classification system. The thesis also points out four core directions for future ZIC research.621.3University of Sheffieldhttps://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.727304http://etheses.whiterose.ac.uk/18613/Electronic Thesis or Dissertation