HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data
The high-throughput genome-wide chromosome conformation capture (Hi-C) method has recently become an important tool to study chromosomal interactions where one can extract meaningful biological information including P(s) curve, topologically associated domains, A/B compartments, and other biological...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2021-01-01
|
Series: | Computational and Structural Biotechnology Journal |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2001037021001768 |
id |
doaj-f0c5bd642b8649a79836140012ee7ef6 |
---|---|
record_format |
Article |
spelling |
doaj-f0c5bd642b8649a79836140012ee7ef62021-05-08T04:22:24ZengElsevierComputational and Structural Biotechnology Journal2001-03702021-01-011926372645HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact dataHonglong Wu0Xuebin Wang1Mengtian Chu2Dongfang Li3Lixin Cheng4Ke Zhou5Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei 430000, China; BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, ChinaBGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, ChinaBGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, ChinaWuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei 430000, China; BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, ChinaShenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen 518020, China; Corresponding authors.Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei 430000, China; Corresponding authors.The high-throughput genome-wide chromosome conformation capture (Hi-C) method has recently become an important tool to study chromosomal interactions where one can extract meaningful biological information including P(s) curve, topologically associated domains, A/B compartments, and other biologically relevant signals. Normalization is a critical pre-processing step of downstream analyses for the elimination of systematic and technical biases from chromatin contact matrices due to different mappability, GC content, and restriction fragment lengths. Especially, the problem of high sparsity puts forward a huge challenge on the correction, indicating the urgent need for a stable and efficient method for Hi-C data normalization. Recently, some matrix balancing methods have been developed to normalize Hi-C data, such as the Knight-Ruiz (KR) algorithm, but it failed to normalize contact matrices with high sparsity. Here, we presented an algorithm, Hi-C Matrix Balancing (HCMB), based on an iterative solution of equations, combining with linear search and projection strategy to normalize the Hi-C original interaction data. Both the simulated and experimental data demonstrated that HCMB is robust and efficient in normalizing Hi-C data and preserving the biologically relevant Hi-C features even facing very high sparsity. HCMB is implemented in Python and is freely accessible to non-commercial users at GitHub: https://github.com/HUST-DataMan/HCMB.http://www.sciencedirect.com/science/article/pii/S2001037021001768Hi-CNormalizationMatrix balancingDoubly stochastic matrixSparsity |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Honglong Wu Xuebin Wang Mengtian Chu Dongfang Li Lixin Cheng Ke Zhou |
spellingShingle |
Honglong Wu Xuebin Wang Mengtian Chu Dongfang Li Lixin Cheng Ke Zhou HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data Computational and Structural Biotechnology Journal Hi-C Normalization Matrix balancing Doubly stochastic matrix Sparsity |
author_facet |
Honglong Wu Xuebin Wang Mengtian Chu Dongfang Li Lixin Cheng Ke Zhou |
author_sort |
Honglong Wu |
title |
HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data |
title_short |
HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data |
title_full |
HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data |
title_fullStr |
HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data |
title_full_unstemmed |
HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data |
title_sort |
hcmb: a stable and efficient algorithm for processing the normalization of highly sparse hi-c contact data |
publisher |
Elsevier |
series |
Computational and Structural Biotechnology Journal |
issn |
2001-0370 |
publishDate |
2021-01-01 |
description |
The high-throughput genome-wide chromosome conformation capture (Hi-C) method has recently become an important tool to study chromosomal interactions where one can extract meaningful biological information including P(s) curve, topologically associated domains, A/B compartments, and other biologically relevant signals. Normalization is a critical pre-processing step of downstream analyses for the elimination of systematic and technical biases from chromatin contact matrices due to different mappability, GC content, and restriction fragment lengths. Especially, the problem of high sparsity puts forward a huge challenge on the correction, indicating the urgent need for a stable and efficient method for Hi-C data normalization. Recently, some matrix balancing methods have been developed to normalize Hi-C data, such as the Knight-Ruiz (KR) algorithm, but it failed to normalize contact matrices with high sparsity. Here, we presented an algorithm, Hi-C Matrix Balancing (HCMB), based on an iterative solution of equations, combining with linear search and projection strategy to normalize the Hi-C original interaction data. Both the simulated and experimental data demonstrated that HCMB is robust and efficient in normalizing Hi-C data and preserving the biologically relevant Hi-C features even facing very high sparsity. HCMB is implemented in Python and is freely accessible to non-commercial users at GitHub: https://github.com/HUST-DataMan/HCMB. |
topic |
Hi-C Normalization Matrix balancing Doubly stochastic matrix Sparsity |
url |
http://www.sciencedirect.com/science/article/pii/S2001037021001768 |
work_keys_str_mv |
AT honglongwu hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata AT xuebinwang hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata AT mengtianchu hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata AT dongfangli hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata AT lixincheng hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata AT kezhou hcmbastableandefficientalgorithmforprocessingthenormalizationofhighlysparsehiccontactdata |
_version_ |
1721455204034936832 |