Data-Driven Visual Forecasting

Understanding the temporal dimension of images is a fundamental part of computer vision. Humans are able to interpret howthe entities in an image will change over time. However, it has only been relatively recently that researchers have focused on visual forecasting— getting machines to anticipate e...

Full description

Bibliographic Details
Main Author:	Walker, Jacob Charles
Format:	Others
Published:	Research Showcase @ CMU 2018
Online Access:	http://repository.cmu.edu/dissertations/1221 http://repository.cmu.edu/cgi/viewcontent.cgi?article=2260&context=dissertations

id	ndltd-cmu.edu-oai-repository.cmu.edu-dissertations-2260
record_format	oai_dc
spelling	ndltd-cmu.edu-oai-repository.cmu.edu-dissertations-22602018-06-07T03:25:21Z Data-Driven Visual Forecasting Walker, Jacob Charles Understanding the temporal dimension of images is a fundamental part of computer vision. Humans are able to interpret howthe entities in an image will change over time. However, it has only been relatively recently that researchers have focused on visual forecasting— getting machines to anticipate events in the visual world before they actually happen. This aspect of vision has many practical implications for tasks ranging from human-computer interaction to anomaly detection. In addition, temporal prediction can serve as a task for representation learning, useful for various other recognition problems. In this thesis, we focus on visual forecasting that is data-driven, self-supervised, and relies on little to no explicit semantic information. Towards this goal, we explore prediction at different timeframes. We first consider predicting instantaneous pixelmotion—optical flow. We apply convolutional neural networks to predict optical flow in static images. We then extend this idea to a longer timeframe, generalizing to pixel trajectory prediction in spacetime. We incorporate models such as variational autoencoders to generate future possible motions in the scene. After this, we consider a mid-level element approach to forecasting. By combining a Markovian reasoning framework with an intermediate representation, we are able to forecast events over longer timescales. This dissertation then builds upon these ideas towards structured representations for visual forecasting. Specifically, we aim to reason about the future of images in a structured state space. Instead of directly predicting events in a low-level feature space such as pixels or motion, we forecast events in a higher level representation that is still visually meaningful. This approach confers a number of advantages. It is not restricted by explicit timescales like motion-based approaches, and, unlike direct pixel-based approaches, predictions are less likely to “fall off” the manifold of the true visual world. 2018-04-01T07:00:00Z text application/pdf http://repository.cmu.edu/dissertations/1221 http://repository.cmu.edu/cgi/viewcontent.cgi?article=2260&context=dissertations Dissertations Research Showcase @ CMU
collection	NDLTD
format	Others
sources	NDLTD
description	Understanding the temporal dimension of images is a fundamental part of computer vision. Humans are able to interpret howthe entities in an image will change over time. However, it has only been relatively recently that researchers have focused on visual forecasting— getting machines to anticipate events in the visual world before they actually happen. This aspect of vision has many practical implications for tasks ranging from human-computer interaction to anomaly detection. In addition, temporal prediction can serve as a task for representation learning, useful for various other recognition problems. In this thesis, we focus on visual forecasting that is data-driven, self-supervised, and relies on little to no explicit semantic information. Towards this goal, we explore prediction at different timeframes. We first consider predicting instantaneous pixelmotion—optical flow. We apply convolutional neural networks to predict optical flow in static images. We then extend this idea to a longer timeframe, generalizing to pixel trajectory prediction in spacetime. We incorporate models such as variational autoencoders to generate future possible motions in the scene. After this, we consider a mid-level element approach to forecasting. By combining a Markovian reasoning framework with an intermediate representation, we are able to forecast events over longer timescales. This dissertation then builds upon these ideas towards structured representations for visual forecasting. Specifically, we aim to reason about the future of images in a structured state space. Instead of directly predicting events in a low-level feature space such as pixels or motion, we forecast events in a higher level representation that is still visually meaningful. This approach confers a number of advantages. It is not restricted by explicit timescales like motion-based approaches, and, unlike direct pixel-based approaches, predictions are less likely to “fall off” the manifold of the true visual world.
author	Walker, Jacob Charles
spellingShingle	Walker, Jacob Charles Data-Driven Visual Forecasting
author_facet	Walker, Jacob Charles
author_sort	Walker, Jacob Charles
title	Data-Driven Visual Forecasting
title_short	Data-Driven Visual Forecasting
title_full	Data-Driven Visual Forecasting
title_fullStr	Data-Driven Visual Forecasting
title_full_unstemmed	Data-Driven Visual Forecasting
title_sort	data-driven visual forecasting
publisher	Research Showcase @ CMU
publishDate	2018
url	http://repository.cmu.edu/dissertations/1221 http://repository.cmu.edu/cgi/viewcontent.cgi?article=2260&context=dissertations
work_keys_str_mv	AT walkerjacobcharles datadrivenvisualforecasting
_version_	1718692501194276864

Data-Driven Visual Forecasting

Similar Items