Semantic Weighted Multi-View Clustering for Web Content

Clustering is a long-standing important research problem. However, it remains challenging when handling large-scale web data from different types of information resources such as user profile, comments, user preferences and so on. All these aspects can be seen as different views and often admit the...

Full description

Bibliographic Details
Main Authors: Xiaolong Gong, Linpeng Huang, Tiancheng Luo, Zhiyi Ma
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8824050/
Description
Summary:Clustering is a long-standing important research problem. However, it remains challenging when handling large-scale web data from different types of information resources such as user profile, comments, user preferences and so on. All these aspects can be seen as different views and often admit the same underlying clustering of the data. In this paper, we present a novel Semantic Weighted Non-negative Matrix Factorization ($SWNMF$ ) multi-view clustering framework, which can provide an efficient weighted matrix factorization framework, dexterously manipulate multi-view web content, and easily explore the sparseness problem in semantic space of data. Specifically, each view of dataset forming a huge sparse matrix, which results in the non-robust characteristic during the matrix decomposition process, and further influences the accuracy of clustering results. To address above problem, we attempt to use some preference information (e.g. rating values) given by the users as latent semantic information to handle those features that are unobserved in each data point so as to resolve the sparseness problem in all views matrices. To combine multiple views in our large corpus, the overall objective of our proposed $SWNMF$ is to minimize the loss function of weighted non-negative matrix factorization (NMF) under the $l_{2,1}$ -norm and the co-regularized constraint under the $F$ -norm. Extensive experiments on our large-scale multi-view web datasets demonstrate the competitive performance of our solution.
ISSN:2169-3536