Summary: | Representation learning, especially with deep neural networks, has been widely used throughout the fields like machine learning, data mining, and computer vision. It vectorizes raw data as compact and discriminative feature representations to facilitate downstream tasks. Among existing methods, data reconstruction (e.g., auto-encoders) forms a mainstream research direction, which naturally discovers the intrinsic structure of data and provides generic feature
representations. However, solely reconstructing the data itself could deviate from the final goal of a task and also limit its generalizability. In light of this, we concentrate on learning representations with the higher-level information guided reconstruction. In this dissertation, we investigate a generalized representation learning framework of an encoder-decoder architecture, where the encoder/decoder models are parameterized by different neural networks for handling a variety of
data types. Moreover, we leverage the pre-defined data transformation to empower the versatility of reconstruction. The higher-level information is closely related to specific tasks. Particularly, we instantiate it as cluster partitions, semantic annotations, and labels, for three different topics. First, we explore the big data cluster analysis from ensemble clustering, multi-view clustering, and deep clustering. We develop low-rank modeling, denoising auto-encoders, and adversarial
regularizer to exploit the structured information inside partitions, respectively. Second, we study the interpretable user modeling problem by incorporating the recurrent memory networks into a session-to-session model, which fully utilizes the text annotations from auxiliary sources to explain the user behavior via a semantics-informed attention mechanism. Third, we propose a memory augmented generative model to predict the forthcoming action upon an incomplete video. The label of
action category is used by discriminator for a classification objective. Building on the top of data reconstruction, we mine the higher-level information with domain knowledge orienting to tasks and impose it on feature representations through designing different regularizations and loss functions. This dissertation shows the great potential and flexibility of using the proposed methodology in a wide range of applications.
|