Block-Constraint Laplacian-Regularized Low-Rank Representation and Its Application for Cancer Sample Clustering Based on Integrated TCGA Data

Low-Rank Representation (LRR) is a powerful subspace clustering method because of its successful learning of low-dimensional subspace of data. With the breakthrough of “OMics” technology, many LRR-based methods have been proposed and used to cancer clustering based on gene expression data. Moreover,...

Full description

Bibliographic Details
Main Authors: Juan Wang, Jin-Xing Liu, Chun-Hou Zheng, Cong-Hai Lu, Ling-Yun Dai, Xiang-Zhen Kong
Format: Article
Language:English
Published: Hindawi-Wiley 2020-01-01
Series:Complexity
Online Access:http://dx.doi.org/10.1155/2020/4865738
Description
Summary:Low-Rank Representation (LRR) is a powerful subspace clustering method because of its successful learning of low-dimensional subspace of data. With the breakthrough of “OMics” technology, many LRR-based methods have been proposed and used to cancer clustering based on gene expression data. Moreover, studies have shown that besides gene expression data, some other genomic data in TCGA also contain important information for cancer research. Therefore, these genomic data can be integrated as a comprehensive feature source for cancer clustering. How to establish an effective clustering model for comprehensive analysis of integrated TCGA data has become a key issue. In this paper, we develop the traditional LRR method and propose a novel method named Block-constraint Laplacian-Regularized Low-Rank Representation (BLLRR) to model multigenome data for cancer sample clustering. The proposed method is dedicated to extracting more abundant subspace structure information from multiple genomic data to improve the accuracy of cancer sample clustering. Considering the heterogeneity of different genome data, we introduce the block-constraint idea into our method. In BLLRR decomposition, we treat each genome data as a data block and impose different constraints on different data blocks. In addition, graph Laplacian is also introduced into our method to better learn the topological structure of data by preserving the local geometric information. The experiments demonstrate that the BLLRR method can effectively analyze integrated TCGA data and extract more subspace structure information from multigenome data. It is a reliable and efficient clustering algorithm for cancer sample clustering.
ISSN:1076-2787
1099-0526