Summary: | Cameras are ubiquitously everywhere and hold the promise of significantly changing the way we live and interact with our environment. A major roadblock to achieve this potential is the curse of dimensionality: often the actionable information is a very small fraction of a vast amount of data, which is difficult to extract in real-time. In this thesis we propose to address this issue by exploiting dynamics-based invariants as an information encapsulating paradigm. The
approach is inspired by the fundamental fact that visual data comes in streams: videos are temporal sequences of frames, images are ordered sequences of rows of pixels and contours are chained sequences of edges. We make this ordering explicit by treating the data streams as outputs of dynamic systems that have associated quantities which are invariant to affine transformations, initial conditions, and viewpoint changes. These invariants provide compact representations of the dynamic
information in the data, yet they can be efficiently extracted while avoiding identifying the underlying models. The power of the proposed framework is illustrated by applying it to several problems in dynamic scene understanding: activity recognition, shape representation, and multi-camera tracking.
|