Learning to Detect Objects from Eye-Tracking Data

One of the bottlenecks in computer vision, especially in object detection, is the need for a large amount of training data. Typically, this is acquired by manually annotating images by hand. In this study, we explore the possibility of using eye-trackers to provide training data for supervised machi...

Full description

Bibliographic Details
Main Authors: D.P Papadopoulous, A.D.F Clarke, F Keller, V Ferrari
Format: Article
Language:English
Published: SAGE Publishing 2014-08-01
Series:i-Perception
Online Access:http://ipe.sagepub.com/content/5/5/488.full.pdf
id doaj-89317fe5663d4c9e8b4eae4a04b8efd2
record_format Article
spelling doaj-89317fe5663d4c9e8b4eae4a04b8efd22020-11-25T03:18:22ZengSAGE Publishingi-Perception2041-66952014-08-015548848810.1068/ii5710.1068_ii57Learning to Detect Objects from Eye-Tracking DataD.P Papadopoulous0A.D.F ClarkeF KellerV FerrariSchool of Informatics, University of Edinburgh, UKOne of the bottlenecks in computer vision, especially in object detection, is the need for a large amount of training data. Typically, this is acquired by manually annotating images by hand. In this study, we explore the possibility of using eye-trackers to provide training data for supervised machine learning. We have created a new large scale eye-tracking dataset, collecting fixation data for 6270 images from the Pascal VOC 2012 database. This represents 10 of the 20 classes included in the Pascal database. Each image was viewed by 5 observers, and a total of over 178k fixations have been collected. While previous attempts at using fixation data in computer vision were based on a free-viewing paradigm, we used a visual search task in order to increase the proportion of fixations on the target object. Furthermore, we divided the dataset into five pairs of semantically similar classes (cat/dog, bicycle/motorbike, horse/cow, boat/aeroplane and sofa/diningtable), with the observer having to decide which class each image belonged to. This kept the observer's task simple, while decreasing the chance of them using the scene gist to identify the target parafoveally. In order to alleviate the central bias in scene viewing, the images were presented to the observers with a random offset. The goal of our project is to use the eye-tracking information in order to detect and localise the attended objects. Our model so far, based on features representing the location of the fixations and an appearance model of the attended regions, can successfully predict the location of the target objects in over half of images.http://ipe.sagepub.com/content/5/5/488.full.pdf
collection DOAJ
language English
format Article
sources DOAJ
author D.P Papadopoulous
A.D.F Clarke
F Keller
V Ferrari
spellingShingle D.P Papadopoulous
A.D.F Clarke
F Keller
V Ferrari
Learning to Detect Objects from Eye-Tracking Data
i-Perception
author_facet D.P Papadopoulous
A.D.F Clarke
F Keller
V Ferrari
author_sort D.P Papadopoulous
title Learning to Detect Objects from Eye-Tracking Data
title_short Learning to Detect Objects from Eye-Tracking Data
title_full Learning to Detect Objects from Eye-Tracking Data
title_fullStr Learning to Detect Objects from Eye-Tracking Data
title_full_unstemmed Learning to Detect Objects from Eye-Tracking Data
title_sort learning to detect objects from eye-tracking data
publisher SAGE Publishing
series i-Perception
issn 2041-6695
publishDate 2014-08-01
description One of the bottlenecks in computer vision, especially in object detection, is the need for a large amount of training data. Typically, this is acquired by manually annotating images by hand. In this study, we explore the possibility of using eye-trackers to provide training data for supervised machine learning. We have created a new large scale eye-tracking dataset, collecting fixation data for 6270 images from the Pascal VOC 2012 database. This represents 10 of the 20 classes included in the Pascal database. Each image was viewed by 5 observers, and a total of over 178k fixations have been collected. While previous attempts at using fixation data in computer vision were based on a free-viewing paradigm, we used a visual search task in order to increase the proportion of fixations on the target object. Furthermore, we divided the dataset into five pairs of semantically similar classes (cat/dog, bicycle/motorbike, horse/cow, boat/aeroplane and sofa/diningtable), with the observer having to decide which class each image belonged to. This kept the observer's task simple, while decreasing the chance of them using the scene gist to identify the target parafoveally. In order to alleviate the central bias in scene viewing, the images were presented to the observers with a random offset. The goal of our project is to use the eye-tracking information in order to detect and localise the attended objects. Our model so far, based on features representing the location of the fixations and an appearance model of the attended regions, can successfully predict the location of the target objects in over half of images.
url http://ipe.sagepub.com/content/5/5/488.full.pdf
work_keys_str_mv AT dppapadopoulous learningtodetectobjectsfromeyetrackingdata
AT adfclarke learningtodetectobjectsfromeyetrackingdata
AT fkeller learningtodetectobjectsfromeyetrackingdata
AT vferrari learningtodetectobjectsfromeyetrackingdata
_version_ 1724627112197881856