A study in human attention to guide computational action recognition

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014. === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 93-95). === Computer vision researchers have a lot to learn from the human visual system....

Full description

Bibliographic Details
Main Author: Sinai, Sam
Other Authors: Patrick H. Winston.
Format: Others
Language:English
Published: Massachusetts Institute of Technology 2014
Subjects:
Online Access:http://hdl.handle.net/1721.1/91871
id ndltd-MIT-oai-dspace.mit.edu-1721.1-91871
record_format oai_dc
spelling ndltd-MIT-oai-dspace.mit.edu-1721.1-918712019-05-02T15:36:16Z A study in human attention to guide computational action recognition Sinai, Sam Patrick H. Winston. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014. Cataloged from PDF version of thesis. Includes bibliographical references (pages 93-95). Computer vision researchers have a lot to learn from the human visual system. We, as humans, are usually unaware of how enormously difficult it is to watch a scene and summarize its most important events in words. We only begin to appreciate this truth when we attempt to build a system that performs comparably. In this thesis, I study two features of human visual apparatus: Attention and Peripheral Vision. I then use these to propose heuristics for computational approaches to action recognition. I think that building a system modeled after human vision, with the nonuniform distribution of resolution and processing power, can greatly increase the performance of the computer systems that target action recognition. In this study: (i) I develop and construct tools that allow me to study human vision and its role in action recognition, (ii) I perform four distinct experiments to gain insight into the role of attention and peripheral vision in this task, (iii) I propose computational heuristics, as well as mechanisms, that I believe will increase the efficiency, and recognition power of artificial vision systems. The tools I have developed can be applied to a variety of studies, including those performed on online crowd-sourcing markets (e.g. Amazon's Mechanical Turk). With my human experiments, I demonstrate that there is consistency of visual behavior among multiple subjects when they are asked to report the occurrence of a verb. Further, I demonstrate that while peripheral vision may play a small direct role in action recognition, it is a key component of attentional allocation, whereby it becomes fundamental to action recognition. Moreover, I propose heuristics based on these experiments, that can be informative to the artificial systems. In particular, I argue that the proper medium for action recognition are videos, not still images, and the basic driver of attention should be movement. Finally, I outline a computational mechanism that incorporates these heuristics into an implementable scheme. by Sam Sinai. M. Eng. 2014-11-24T18:41:27Z 2014-11-24T18:41:27Z 2014 2014 Thesis http://hdl.handle.net/1721.1/91871 894355843 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 95 pages application/pdf Massachusetts Institute of Technology
collection NDLTD
language English
format Others
sources NDLTD
topic Electrical Engineering and Computer Science.
spellingShingle Electrical Engineering and Computer Science.
Sinai, Sam
A study in human attention to guide computational action recognition
description Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014. === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 93-95). === Computer vision researchers have a lot to learn from the human visual system. We, as humans, are usually unaware of how enormously difficult it is to watch a scene and summarize its most important events in words. We only begin to appreciate this truth when we attempt to build a system that performs comparably. In this thesis, I study two features of human visual apparatus: Attention and Peripheral Vision. I then use these to propose heuristics for computational approaches to action recognition. I think that building a system modeled after human vision, with the nonuniform distribution of resolution and processing power, can greatly increase the performance of the computer systems that target action recognition. In this study: (i) I develop and construct tools that allow me to study human vision and its role in action recognition, (ii) I perform four distinct experiments to gain insight into the role of attention and peripheral vision in this task, (iii) I propose computational heuristics, as well as mechanisms, that I believe will increase the efficiency, and recognition power of artificial vision systems. The tools I have developed can be applied to a variety of studies, including those performed on online crowd-sourcing markets (e.g. Amazon's Mechanical Turk). With my human experiments, I demonstrate that there is consistency of visual behavior among multiple subjects when they are asked to report the occurrence of a verb. Further, I demonstrate that while peripheral vision may play a small direct role in action recognition, it is a key component of attentional allocation, whereby it becomes fundamental to action recognition. Moreover, I propose heuristics based on these experiments, that can be informative to the artificial systems. In particular, I argue that the proper medium for action recognition are videos, not still images, and the basic driver of attention should be movement. Finally, I outline a computational mechanism that incorporates these heuristics into an implementable scheme. === by Sam Sinai. === M. Eng.
author2 Patrick H. Winston.
author_facet Patrick H. Winston.
Sinai, Sam
author Sinai, Sam
author_sort Sinai, Sam
title A study in human attention to guide computational action recognition
title_short A study in human attention to guide computational action recognition
title_full A study in human attention to guide computational action recognition
title_fullStr A study in human attention to guide computational action recognition
title_full_unstemmed A study in human attention to guide computational action recognition
title_sort study in human attention to guide computational action recognition
publisher Massachusetts Institute of Technology
publishDate 2014
url http://hdl.handle.net/1721.1/91871
work_keys_str_mv AT sinaisam astudyinhumanattentiontoguidecomputationalactionrecognition
AT sinaisam studyinhumanattentiontoguidecomputationalactionrecognition
_version_ 1719025178810253312