HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data

The high-throughput genome-wide chromosome conformation capture (Hi-C) method has recently become an important tool to study chromosomal interactions where one can extract meaningful biological information including P(s) curve, topologically associated domains, A/B compartments, and other biological...

Full description

Bibliographic Details
Main Authors: Honglong Wu, Xuebin Wang, Mengtian Chu, Dongfang Li, Lixin Cheng, Ke Zhou
Format: Article
Language:English
Published: Elsevier 2021-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037021001768
id doaj-f0c5bd642b8649a79836140012ee7ef6
record_format Article
spelling doaj-f0c5bd642b8649a79836140012ee7ef62021-05-08T04:22:24ZengElsevierComputational and Structural Biotechnology Journal2001-03702021-01-011926372645HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact dataHonglong Wu0Xuebin Wang1Mengtian Chu2Dongfang Li3Lixin Cheng4Ke Zhou5Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei 430000, China; BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, ChinaBGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, ChinaBGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, ChinaWuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei 430000, China; BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, ChinaShenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen 518020, China; Corresponding authors.Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei 430000, China; Corresponding authors.The high-throughput genome-wide chromosome conformation capture (Hi-C) method has recently become an important tool to study chromosomal interactions where one can extract meaningful biological information including P(s) curve, topologically associated domains, A/B compartments, and other biologically relevant signals. Normalization is a critical pre-processing step of downstream analyses for the elimination of systematic and technical biases from chromatin contact matrices due to different mappability, GC content, and restriction fragment lengths. Especially, the problem of high sparsity puts forward a huge challenge on the correction, indicating the urgent need for a stable and efficient method for Hi-C data normalization. Recently, some matrix balancing methods have been developed to normalize Hi-C data, such as the Knight-Ruiz (KR) algorithm, but it failed to normalize contact matrices with high sparsity. Here, we presented an algorithm, Hi-C Matrix Balancing (HCMB), based on an iterative solution of equations, combining with linear search and projection strategy to normalize the Hi-C original interaction data. Both the simulated and experimental data demonstrated that HCMB is robust and efficient in normalizing Hi-C data and preserving the biologically relevant Hi-C features even facing very high sparsity. HCMB is implemented in Python and is freely accessible to non-commercial users at GitHub: https://github.com/HUST-DataMan/HCMB.http://www.sciencedirect.com/science/article/pii/S2001037021001768Hi-CNormalizationMatrix balancingDoubly stochastic matrixSparsity
collection DOAJ
language English
format Article
sources DOAJ
author Honglong Wu
Xuebin Wang
Mengtian Chu
Dongfang Li
Lixin Cheng
Ke Zhou
spellingShingle Honglong Wu
Xuebin Wang
Mengtian Chu
Dongfang Li
Lixin Cheng
Ke Zhou
HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data
Computational and Structural Biotechnology Journal
Hi-C
Normalization
Matrix balancing
Doubly stochastic matrix
Sparsity
author_facet Honglong Wu
Xuebin Wang
Mengtian Chu
Dongfang Li
Lixin Cheng
Ke Zhou
author_sort Honglong Wu
title HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data
title_short HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data
title_full HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data
title_fullStr HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data
title_full_unstemmed HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data
title_sort hcmb: a stable and efficient algorithm for processing the normalization of highly sparse hi-c contact data
publisher Elsevier
series Computational and Structural Biotechnology Journal
issn 2001-0370
publishDate 2021-01-01
description The high-throughput genome-wide chromosome conformation capture (Hi-C) method has recently become an important tool to study chromosomal interactions where one can extract meaningful biological information including P(s) curve, topologically associated domains, A/B compartments, and other biologically relevant signals. Normalization is a critical pre-processing step of downstream analyses for the elimination of systematic and technical biases from chromatin contact matrices due to different mappability, GC content, and restriction fragment lengths. Especially, the problem of high sparsity puts forward a huge challenge on the correction, indicating the urgent need for a stable and efficient method for Hi-C data normalization. Recently, some matrix balancing methods have been developed to normalize Hi-C data, such as the Knight-Ruiz (KR) algorithm, but it failed to normalize contact matrices with high sparsity. Here, we presented an algorithm, Hi-C Matrix Balancing (HCMB), based on an iterative solution of equations, combining with linear search and projection strategy to normalize the Hi-C original interaction data. Both the simulated and experimental data demonstrated that HCMB is robust and efficient in normalizing Hi-C data and preserving the biologically relevant Hi-C features even facing very high sparsity. HCMB is implemented in Python and is freely accessible to non-commercial users at GitHub: https://github.com/HUST-DataMan/HCMB.
topic Hi-C
Normalization
Matrix balancing
Doubly stochastic matrix
Sparsity
url http://www.sciencedirect.com/science/article/pii/S2001037021001768
work_keys_str_mv AT honglongwu hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata
AT xuebinwang hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata
AT mengtianchu hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata
AT dongfangli hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata
AT lixincheng hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata
AT kezhou hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata
_version_ 1721455204034936832