Learning to Detect Objects from Eye-Tracking Data
One of the bottlenecks in computer vision, especially in object detection, is the need for a large amount of training data. Typically, this is acquired by manually annotating images by hand. In this study, we explore the possibility of using eye-trackers to provide training data for supervised machi...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SAGE Publishing
2014-08-01
|
Series: | i-Perception |
Online Access: | http://ipe.sagepub.com/content/5/5/488.full.pdf |
id |
doaj-89317fe5663d4c9e8b4eae4a04b8efd2 |
---|---|
record_format |
Article |
spelling |
doaj-89317fe5663d4c9e8b4eae4a04b8efd22020-11-25T03:18:22ZengSAGE Publishingi-Perception2041-66952014-08-015548848810.1068/ii5710.1068_ii57Learning to Detect Objects from Eye-Tracking DataD.P Papadopoulous0A.D.F ClarkeF KellerV FerrariSchool of Informatics, University of Edinburgh, UKOne of the bottlenecks in computer vision, especially in object detection, is the need for a large amount of training data. Typically, this is acquired by manually annotating images by hand. In this study, we explore the possibility of using eye-trackers to provide training data for supervised machine learning. We have created a new large scale eye-tracking dataset, collecting fixation data for 6270 images from the Pascal VOC 2012 database. This represents 10 of the 20 classes included in the Pascal database. Each image was viewed by 5 observers, and a total of over 178k fixations have been collected. While previous attempts at using fixation data in computer vision were based on a free-viewing paradigm, we used a visual search task in order to increase the proportion of fixations on the target object. Furthermore, we divided the dataset into five pairs of semantically similar classes (cat/dog, bicycle/motorbike, horse/cow, boat/aeroplane and sofa/diningtable), with the observer having to decide which class each image belonged to. This kept the observer's task simple, while decreasing the chance of them using the scene gist to identify the target parafoveally. In order to alleviate the central bias in scene viewing, the images were presented to the observers with a random offset. The goal of our project is to use the eye-tracking information in order to detect and localise the attended objects. Our model so far, based on features representing the location of the fixations and an appearance model of the attended regions, can successfully predict the location of the target objects in over half of images.http://ipe.sagepub.com/content/5/5/488.full.pdf |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
D.P Papadopoulous A.D.F Clarke F Keller V Ferrari |
spellingShingle |
D.P Papadopoulous A.D.F Clarke F Keller V Ferrari Learning to Detect Objects from Eye-Tracking Data i-Perception |
author_facet |
D.P Papadopoulous A.D.F Clarke F Keller V Ferrari |
author_sort |
D.P Papadopoulous |
title |
Learning to Detect Objects from Eye-Tracking Data |
title_short |
Learning to Detect Objects from Eye-Tracking Data |
title_full |
Learning to Detect Objects from Eye-Tracking Data |
title_fullStr |
Learning to Detect Objects from Eye-Tracking Data |
title_full_unstemmed |
Learning to Detect Objects from Eye-Tracking Data |
title_sort |
learning to detect objects from eye-tracking data |
publisher |
SAGE Publishing |
series |
i-Perception |
issn |
2041-6695 |
publishDate |
2014-08-01 |
description |
One of the bottlenecks in computer vision, especially in object detection, is the need for a large amount of training data. Typically, this is acquired by manually annotating images by hand. In this study, we explore the possibility of using eye-trackers to provide training data for supervised machine learning. We have created a new large scale eye-tracking dataset, collecting fixation data for 6270 images from the Pascal VOC 2012 database. This represents 10 of the 20 classes included in the Pascal database. Each image was viewed by 5 observers, and a total of over 178k fixations have been collected. While previous attempts at using fixation data in computer vision were based on a free-viewing paradigm, we used a visual search task in order to increase the proportion of fixations on the target object. Furthermore, we divided the dataset into five pairs of semantically similar classes (cat/dog, bicycle/motorbike, horse/cow, boat/aeroplane and sofa/diningtable), with the observer having to decide which class each image belonged to. This kept the observer's task simple, while decreasing the chance of them using the scene gist to identify the target parafoveally. In order to alleviate the central bias in scene viewing, the images were presented to the observers with a random offset. The goal of our project is to use the eye-tracking information in order to detect and localise the attended objects. Our model so far, based on features representing the location of the fixations and an appearance model of the attended regions, can successfully predict the location of the target objects in over half of images. |
url |
http://ipe.sagepub.com/content/5/5/488.full.pdf |
work_keys_str_mv |
AT dppapadopoulous learningtodetectobjectsfromeyetrackingdata AT adfclarke learningtodetectobjectsfromeyetrackingdata AT fkeller learningtodetectobjectsfromeyetrackingdata AT vferrari learningtodetectobjectsfromeyetrackingdata |
_version_ |
1724627112197881856 |