Sparse Coding and Compressed Sensing: Locally Competitive Algorithms and Random Projections
For an 8-bit grayscale image patch of size n x n, the number of distinguishable signals is 256(n2). Natural images (e.g.,photographs of a natural scene) comprise a very small subset of these possible signals. Traditional image and video processing relies on band-limited or low-pass signal models....
Other Authors: | |
---|---|
Format: | Others |
Language: | English |
Published: |
Florida Atlantic University
|
Subjects: | |
Online Access: | http://purl.flvc.org/fau/fd/FA00004713 http://purl.flvc.org/fau/fd/FA00004713 |
Summary: | For an 8-bit grayscale image patch of size n x n, the number of distinguishable
signals is 256(n2). Natural images (e.g.,photographs of a natural scene) comprise a
very small subset of these possible signals. Traditional image and video processing
relies on band-limited or low-pass signal models. In contrast, we will explore the
observation that most signals of interest are sparse, i.e. in a particular basis most
of the expansion coefficients will be zero. Recent developments in sparse modeling
and L1 optimization have allowed for extraordinary applications such as the single
pixel camera, as well as computer vision systems that can exceed human performance.
Here we present a novel neural network architecture combining a sparse filter model
and locally competitive algorithms (LCAs), and demonstrate the networks ability to
classify human actions from video. Sparse filtering is an unsupervised feature learning
algorithm designed to optimize the sparsity of the feature distribution directly without
having the need to model the data distribution. LCAs are defined by a system of
di↵erential equations where the initial conditions define an optimization problem and the dynamics converge to a sparse decomposition of the input vector. We applied
this architecture to train a classifier on categories of motion in human action videos.
Inputs to the network were small 3D patches taken from frame di↵erences in the
videos. Dictionaries were derived for each action class and then activation levels for
each dictionary were assessed during reconstruction of a novel test patch. We discuss
how this sparse modeling approach provides a natural framework for multi-sensory
and multimodal data processing including RGB video, RGBD video, hyper-spectral
video, and stereo audio/video streams. === Includes bibliography. === Dissertation (Ph.D.)--Florida Atlantic University, 2016. === FAU Electronic Theses and Dissertations Collection |
---|