A study in human attention to guide computational action recognition
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014. === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 93-95). === Computer vision researchers have a lot to learn from the human visual system....
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | English |
Published: |
Massachusetts Institute of Technology
2014
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/91871 |
id |
ndltd-MIT-oai-dspace.mit.edu-1721.1-91871 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-MIT-oai-dspace.mit.edu-1721.1-918712019-05-02T15:36:16Z A study in human attention to guide computational action recognition Sinai, Sam Patrick H. Winston. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014. Cataloged from PDF version of thesis. Includes bibliographical references (pages 93-95). Computer vision researchers have a lot to learn from the human visual system. We, as humans, are usually unaware of how enormously difficult it is to watch a scene and summarize its most important events in words. We only begin to appreciate this truth when we attempt to build a system that performs comparably. In this thesis, I study two features of human visual apparatus: Attention and Peripheral Vision. I then use these to propose heuristics for computational approaches to action recognition. I think that building a system modeled after human vision, with the nonuniform distribution of resolution and processing power, can greatly increase the performance of the computer systems that target action recognition. In this study: (i) I develop and construct tools that allow me to study human vision and its role in action recognition, (ii) I perform four distinct experiments to gain insight into the role of attention and peripheral vision in this task, (iii) I propose computational heuristics, as well as mechanisms, that I believe will increase the efficiency, and recognition power of artificial vision systems. The tools I have developed can be applied to a variety of studies, including those performed on online crowd-sourcing markets (e.g. Amazon's Mechanical Turk). With my human experiments, I demonstrate that there is consistency of visual behavior among multiple subjects when they are asked to report the occurrence of a verb. Further, I demonstrate that while peripheral vision may play a small direct role in action recognition, it is a key component of attentional allocation, whereby it becomes fundamental to action recognition. Moreover, I propose heuristics based on these experiments, that can be informative to the artificial systems. In particular, I argue that the proper medium for action recognition are videos, not still images, and the basic driver of attention should be movement. Finally, I outline a computational mechanism that incorporates these heuristics into an implementable scheme. by Sam Sinai. M. Eng. 2014-11-24T18:41:27Z 2014-11-24T18:41:27Z 2014 2014 Thesis http://hdl.handle.net/1721.1/91871 894355843 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 95 pages application/pdf Massachusetts Institute of Technology |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
Electrical Engineering and Computer Science. |
spellingShingle |
Electrical Engineering and Computer Science. Sinai, Sam A study in human attention to guide computational action recognition |
description |
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014. === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 93-95). === Computer vision researchers have a lot to learn from the human visual system. We, as humans, are usually unaware of how enormously difficult it is to watch a scene and summarize its most important events in words. We only begin to appreciate this truth when we attempt to build a system that performs comparably. In this thesis, I study two features of human visual apparatus: Attention and Peripheral Vision. I then use these to propose heuristics for computational approaches to action recognition. I think that building a system modeled after human vision, with the nonuniform distribution of resolution and processing power, can greatly increase the performance of the computer systems that target action recognition. In this study: (i) I develop and construct tools that allow me to study human vision and its role in action recognition, (ii) I perform four distinct experiments to gain insight into the role of attention and peripheral vision in this task, (iii) I propose computational heuristics, as well as mechanisms, that I believe will increase the efficiency, and recognition power of artificial vision systems. The tools I have developed can be applied to a variety of studies, including those performed on online crowd-sourcing markets (e.g. Amazon's Mechanical Turk). With my human experiments, I demonstrate that there is consistency of visual behavior among multiple subjects when they are asked to report the occurrence of a verb. Further, I demonstrate that while peripheral vision may play a small direct role in action recognition, it is a key component of attentional allocation, whereby it becomes fundamental to action recognition. Moreover, I propose heuristics based on these experiments, that can be informative to the artificial systems. In particular, I argue that the proper medium for action recognition are videos, not still images, and the basic driver of attention should be movement. Finally, I outline a computational mechanism that incorporates these heuristics into an implementable scheme. === by Sam Sinai. === M. Eng. |
author2 |
Patrick H. Winston. |
author_facet |
Patrick H. Winston. Sinai, Sam |
author |
Sinai, Sam |
author_sort |
Sinai, Sam |
title |
A study in human attention to guide computational action recognition |
title_short |
A study in human attention to guide computational action recognition |
title_full |
A study in human attention to guide computational action recognition |
title_fullStr |
A study in human attention to guide computational action recognition |
title_full_unstemmed |
A study in human attention to guide computational action recognition |
title_sort |
study in human attention to guide computational action recognition |
publisher |
Massachusetts Institute of Technology |
publishDate |
2014 |
url |
http://hdl.handle.net/1721.1/91871 |
work_keys_str_mv |
AT sinaisam astudyinhumanattentiontoguidecomputationalactionrecognition AT sinaisam studyinhumanattentiontoguidecomputationalactionrecognition |
_version_ |
1719025178810253312 |