An information-theoretic approach to unsupervised feature selection for high-dimensional data

In this paper, we model the unsupervised learning of a sequence of observed data vector as a problem of extracting joint patterns among random variables. In particular, we formulate an information-theoretic problem to extract common features of random variables by measuring the loss of total correla...

Full description

Bibliographic Details
Main Authors: Huang, Shao-Lun (Author), Zhang, Lin (Author), Zheng, Lizhong (Author)
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor)
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers (IEEE), 2021-06-17T17:50:21Z.
Subjects:
Online Access:Get fulltext
LEADER 01669 am a22001813u 4500
001 131015
042 |a dc 
100 1 0 |a Huang, Shao-Lun  |e author 
100 1 0 |a Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science  |e contributor 
700 1 0 |a Zhang, Lin  |e author 
700 1 0 |a Zheng, Lizhong  |e author 
245 0 0 |a An information-theoretic approach to unsupervised feature selection for high-dimensional data 
260 |b Institute of Electrical and Electronics Engineers (IEEE),   |c 2021-06-17T17:50:21Z. 
856 |z Get fulltext  |u https://hdl.handle.net/1721.1/131015 
520 |a In this paper, we model the unsupervised learning of a sequence of observed data vector as a problem of extracting joint patterns among random variables. In particular, we formulate an information-theoretic problem to extract common features of random variables by measuring the loss of total correlation given the feature. This problem can be solved by a local geometric approach, where the solutions can be represented as singular vectors of some matrices related to the pairwise distributions of the data. In addition, we illustrate how these solutions can be transferred to feature functions in machine learning, which can be computed by efficient algorithms from data vectors. Moreover, we present a generalization of the HGR maximal correlation based on these feature functions, which can be viewed as a nonlinear generalization to linear PCA. Finally, the simulation result shows that our extracted feature functions have great performance in real-world problems. 
546 |a en 
655 7 |a Article 
773 |t 2017 IEEE Information Theory Workshop (ITW)