Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation.

Topic models and neural networks can discover meaningful low-dimensional latent representations of text corpora; as such, they have become a key technology of document representation. However, such models presume all documents are non-discriminatory, resulting in latent representation dependent upon...

Full description

Bibliographic Details
Main Authors:	Chao Wei, Senlin Luo, Xincheng Ma, Hao Ren, Ji Zhang, Limin Pan
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2016-01-01
Series:	PLoS ONE
Online Access:	http://europepmc.org/articles/PMC4718658?pdf=render

id	doaj-ac1ddb34836b4c74921a69b5e117dde8
record_format	Article
spelling	doaj-ac1ddb34836b4c74921a69b5e117dde82020-11-25T00:59:48ZengPublic Library of Science (PLoS)PLoS ONE1932-62032016-01-01111e014667210.1371/journal.pone.0146672Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation.Chao WeiSenlin LuoXincheng MaHao RenJi ZhangLimin PanTopic models and neural networks can discover meaningful low-dimensional latent representations of text corpora; as such, they have become a key technology of document representation. However, such models presume all documents are non-discriminatory, resulting in latent representation dependent upon all other documents and an inability to provide discriminative document representation. To address this problem, we propose a semi-supervised manifold-inspired autoencoder to extract meaningful latent representations of documents, taking the local perspective that the latent representation of nearby documents should be correlative. We first determine the discriminative neighbors set with Euclidean distance in observation spaces. Then, the autoencoder is trained by joint minimization of the Bernoulli cross-entropy error between input and output and the sum of the square error between neighbors of input and output. The results of two widely used corpora show that our method yields at least a 15% improvement in document clustering and a nearly 7% improvement in classification tasks compared to comparative methods. The evidence demonstrates that our method can readily capture more discriminative latent representation of new documents. Moreover, some meaningful combinations of words can be efficiently discovered by activating features that promote the comprehensibility of latent representation.http://europepmc.org/articles/PMC4718658?pdf=render
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Chao Wei Senlin Luo Xincheng Ma Hao Ren Ji Zhang Limin Pan
spellingShingle	Chao Wei Senlin Luo Xincheng Ma Hao Ren Ji Zhang Limin Pan Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation. PLoS ONE
author_facet	Chao Wei Senlin Luo Xincheng Ma Hao Ren Ji Zhang Limin Pan
author_sort	Chao Wei
title	Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation.
title_short	Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation.
title_full	Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation.
title_fullStr	Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation.
title_full_unstemmed	Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation.
title_sort	locally embedding autoencoders: a semi-supervised manifold learning approach of document representation.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2016-01-01
description	Topic models and neural networks can discover meaningful low-dimensional latent representations of text corpora; as such, they have become a key technology of document representation. However, such models presume all documents are non-discriminatory, resulting in latent representation dependent upon all other documents and an inability to provide discriminative document representation. To address this problem, we propose a semi-supervised manifold-inspired autoencoder to extract meaningful latent representations of documents, taking the local perspective that the latent representation of nearby documents should be correlative. We first determine the discriminative neighbors set with Euclidean distance in observation spaces. Then, the autoencoder is trained by joint minimization of the Bernoulli cross-entropy error between input and output and the sum of the square error between neighbors of input and output. The results of two widely used corpora show that our method yields at least a 15% improvement in document clustering and a nearly 7% improvement in classification tasks compared to comparative methods. The evidence demonstrates that our method can readily capture more discriminative latent representation of new documents. Moreover, some meaningful combinations of words can be efficiently discovered by activating features that promote the comprehensibility of latent representation.
url	http://europepmc.org/articles/PMC4718658?pdf=render
work_keys_str_mv	AT chaowei locallyembeddingautoencodersasemisupervisedmanifoldlearningapproachofdocumentrepresentation AT senlinluo locallyembeddingautoencodersasemisupervisedmanifoldlearningapproachofdocumentrepresentation AT xinchengma locallyembeddingautoencodersasemisupervisedmanifoldlearningapproachofdocumentrepresentation AT haoren locallyembeddingautoencodersasemisupervisedmanifoldlearningapproachofdocumentrepresentation AT jizhang locallyembeddingautoencodersasemisupervisedmanifoldlearningapproachofdocumentrepresentation AT liminpan locallyembeddingautoencodersasemisupervisedmanifoldlearningapproachofdocumentrepresentation
_version_	1725216037959368704

Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation.

Similar Items