Summary: | The Hirschfeld-Gebelein-Rényi maximal correlation is a well-known measure of statistical dependence between two (possibly categorical) random variables. In inference problems, the maximal correlation functions can be viewed as so called features of observed data that carry the largest amount of information about some latent variables. These features are in general non-linear functions, and are particularly useful in processing high-dimensional observed data. The alternating conditional expectations (ACE) algorithm is an efficient way to compute these maximal correlation functions. In this paper, we use an information theoretic approach to interpret the ACE algorithm as computing the singular value decomposition of a linear map between spaces of probability distributions. With this approach, we demonstrate the information theoretic optimality of the ACE algorithm, analyze its convergence rate and sample complexity, and finally, generalize it to compute multiple pairs of correlation functions from samples.
|