Sparse Coding and Compressed Sensing: Locally Competitive Algorithms and Random Projections

For an 8-bit grayscale image patch of size n x n, the number of distinguishable signals is 256(n2). Natural images (e.g.,photographs of a natural scene) comprise a very small subset of these possible signals. Traditional image and video processing relies on band-limited or low-pass signal models....

Full description

Bibliographic Details
Other Authors: Hahn, William E. (author)
Format: Others
Language:English
Published: Florida Atlantic University
Subjects:
Online Access:http://purl.flvc.org/fau/fd/FA00004713
http://purl.flvc.org/fau/fd/FA00004713
Description
Summary:For an 8-bit grayscale image patch of size n x n, the number of distinguishable signals is 256(n2). Natural images (e.g.,photographs of a natural scene) comprise a very small subset of these possible signals. Traditional image and video processing relies on band-limited or low-pass signal models. In contrast, we will explore the observation that most signals of interest are sparse, i.e. in a particular basis most of the expansion coefficients will be zero. Recent developments in sparse modeling and L1 optimization have allowed for extraordinary applications such as the single pixel camera, as well as computer vision systems that can exceed human performance. Here we present a novel neural network architecture combining a sparse filter model and locally competitive algorithms (LCAs), and demonstrate the networks ability to classify human actions from video. Sparse filtering is an unsupervised feature learning algorithm designed to optimize the sparsity of the feature distribution directly without having the need to model the data distribution. LCAs are defined by a system of di↵erential equations where the initial conditions define an optimization problem and the dynamics converge to a sparse decomposition of the input vector. We applied this architecture to train a classifier on categories of motion in human action videos. Inputs to the network were small 3D patches taken from frame di↵erences in the videos. Dictionaries were derived for each action class and then activation levels for each dictionary were assessed during reconstruction of a novel test patch. We discuss how this sparse modeling approach provides a natural framework for multi-sensory and multimodal data processing including RGB video, RGBD video, hyper-spectral video, and stereo audio/video streams. === Includes bibliography. === Dissertation (Ph.D.)--Florida Atlantic University, 2016. === FAU Electronic Theses and Dissertations Collection