Summary: | Embedding local properties of an image, for instance its color intensities or the magnitude and orientation of its gradients, to create a representative feature is a critical component in many computer vision tasks, such as detection, classification, segmentation and tracking. A feature that is representative yet invariant to nuisance factors will scaffold the following modules in the processing pipeline and lead to a better performance for the task at hand. Statistical moments have often been utilized to build such descriptors since they provide a quantitative measure for the shape of the underlying distribution of the data. Examples of these include the covariance matrix feature, bilinear pooling encoding and Gaussian descriptors. However, until now, these features have been limited to using up to second order moments, i.e. the mean and variance of the data, and hence can be poor descriptors when the underlying distribution is non-Gaussian. This dissertation aims towards examining this problem in-depth and identifying possible solutions. In particular, we propose to use feature descriptors based on the empirical moment matrix, which gathers high order moments and embeds them into the manifold of symmetric positive definite (SPD) matrices. The effectiveness of the proposed approach is illustrated in the context of two computer vision problems: person re-Identification (re-ID) and fine-grain classification.
|