An Entropy Clustering Approach for Assessing Visual Question Difficulty

We propose a novel approach to identify the difficulty of visual questions for Visual Question Answering (VQA) without direct supervision or annotations to the difficulty. Prior works have considered the diversity of ground-truth answers of human annotators. In contrast, we analyze the difficulty of...

Full description

Bibliographic Details
Main Authors:	Kento Terao, Toru Tamaki, Bisser Raytchev, Kazufumi Kaneda, Shin'ichi Satoh
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Computer vision visual question answering entropy of answer distributions
Online Access:	https://ieeexplore.ieee.org/document/9187418/

id	doaj-f6037eaddbd04667bb6e4c8b599a235e
record_format	Article
spelling	doaj-f6037eaddbd04667bb6e4c8b599a235e2021-03-30T03:32:49ZengIEEEIEEE Access2169-35362020-01-01818063318064510.1109/ACCESS.2020.30220639187418An Entropy Clustering Approach for Assessing Visual Question DifficultyKento Terao0Toru Tamaki1https://orcid.org/0000-0001-9712-7777Bisser Raytchev2https://orcid.org/0000-0002-2146-415XKazufumi Kaneda3Shin'ichi Satoh4https://orcid.org/0000-0001-6995-6447Hiroshima University, Hiroshima, JapanHiroshima University, Hiroshima, JapanHiroshima University, Hiroshima, JapanHiroshima University, Hiroshima, JapanNational Institute of Informatics, Tokyo, JapanWe propose a novel approach to identify the difficulty of visual questions for Visual Question Answering (VQA) without direct supervision or annotations to the difficulty. Prior works have considered the diversity of ground-truth answers of human annotators. In contrast, we analyze the difficulty of visual questions based on the behavior of multiple different VQA models. We propose to cluster the entropy values of the predicted answer distributions obtained by three different models: a baseline method that takes as input images and questions, and two variants that take as input images only and questions only. We use a simple k-means to cluster the visual questions of the VQA v2 validation set. Then we use state-of-the-art methods to determine the accuracy and the entropy of the answer distributions for each cluster. A benefit of the proposed method is that no annotation of the difficulty is required, because the accuracy of each cluster reflects the difficulty of visual questions that belong to it. Our approach can identify clusters of difficult visual questions that are not answered correctly by state-of-the-art methods. Detailed analysis on the VQA v2 dataset reveals that 1) all methods show poor performances on the most difficult cluster (about 10% accuracy), 2) as the cluster difficulty increases, the answers predicted by the different methods begin to differ, and 3) the values of cluster entropy are highly correlated with the cluster accuracy. We show that our approach has the advantage of being able to assess the difficulty of visual questions without ground-truth (i.e., the test set of VQA v2) by assigning them to one of the clusters. We expect that this can stimulate the development of novel directions of research and new algorithms.https://ieeexplore.ieee.org/document/9187418/Computer visionvisual question answeringentropy of answer distributions
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Kento Terao Toru Tamaki Bisser Raytchev Kazufumi Kaneda Shin'ichi Satoh
spellingShingle	Kento Terao Toru Tamaki Bisser Raytchev Kazufumi Kaneda Shin'ichi Satoh An Entropy Clustering Approach for Assessing Visual Question Difficulty IEEE Access Computer vision visual question answering entropy of answer distributions
author_facet	Kento Terao Toru Tamaki Bisser Raytchev Kazufumi Kaneda Shin'ichi Satoh
author_sort	Kento Terao
title	An Entropy Clustering Approach for Assessing Visual Question Difficulty
title_short	An Entropy Clustering Approach for Assessing Visual Question Difficulty
title_full	An Entropy Clustering Approach for Assessing Visual Question Difficulty
title_fullStr	An Entropy Clustering Approach for Assessing Visual Question Difficulty
title_full_unstemmed	An Entropy Clustering Approach for Assessing Visual Question Difficulty
title_sort	entropy clustering approach for assessing visual question difficulty
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	We propose a novel approach to identify the difficulty of visual questions for Visual Question Answering (VQA) without direct supervision or annotations to the difficulty. Prior works have considered the diversity of ground-truth answers of human annotators. In contrast, we analyze the difficulty of visual questions based on the behavior of multiple different VQA models. We propose to cluster the entropy values of the predicted answer distributions obtained by three different models: a baseline method that takes as input images and questions, and two variants that take as input images only and questions only. We use a simple k-means to cluster the visual questions of the VQA v2 validation set. Then we use state-of-the-art methods to determine the accuracy and the entropy of the answer distributions for each cluster. A benefit of the proposed method is that no annotation of the difficulty is required, because the accuracy of each cluster reflects the difficulty of visual questions that belong to it. Our approach can identify clusters of difficult visual questions that are not answered correctly by state-of-the-art methods. Detailed analysis on the VQA v2 dataset reveals that 1) all methods show poor performances on the most difficult cluster (about 10% accuracy), 2) as the cluster difficulty increases, the answers predicted by the different methods begin to differ, and 3) the values of cluster entropy are highly correlated with the cluster accuracy. We show that our approach has the advantage of being able to assess the difficulty of visual questions without ground-truth (i.e., the test set of VQA v2) by assigning them to one of the clusters. We expect that this can stimulate the development of novel directions of research and new algorithms.
topic	Computer vision visual question answering entropy of answer distributions
url	https://ieeexplore.ieee.org/document/9187418/
work_keys_str_mv	AT kentoterao anentropyclusteringapproachforassessingvisualquestiondifficulty AT torutamaki anentropyclusteringapproachforassessingvisualquestiondifficulty AT bisserraytchev anentropyclusteringapproachforassessingvisualquestiondifficulty AT kazufumikaneda anentropyclusteringapproachforassessingvisualquestiondifficulty AT shinichisatoh anentropyclusteringapproachforassessingvisualquestiondifficulty AT kentoterao entropyclusteringapproachforassessingvisualquestiondifficulty AT torutamaki entropyclusteringapproachforassessingvisualquestiondifficulty AT bisserraytchev entropyclusteringapproachforassessingvisualquestiondifficulty AT kazufumikaneda entropyclusteringapproachforassessingvisualquestiondifficulty AT shinichisatoh entropyclusteringapproachforassessingvisualquestiondifficulty
_version_	1724183363229581312

An Entropy Clustering Approach for Assessing Visual Question Difficulty

Similar Items