Estimating statistical significance of local protein profile-profile alignments

Abstract Background Alignment of sequence families described by profiles provides a sensitive means for establishing homology between proteins and is important in protein evolutionary, structural, and functional studies. In the context of a steadily growing amount of sequence data, estimating the st...

Full description

Bibliographic Details
Main Author: Mindaugas Margelevičius
Format: Article
Language:English
Published: BMC 2019-08-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-019-2913-3
id doaj-b80f9e6f6fd74e79afd60786b0278d1c
record_format Article
spelling doaj-b80f9e6f6fd74e79afd60786b0278d1c2020-11-25T03:30:28ZengBMCBMC Bioinformatics1471-21052019-08-0120111310.1186/s12859-019-2913-3Estimating statistical significance of local protein profile-profile alignmentsMindaugas Margelevičius0Institute of Biotechnology, Life Sciences Center, Vilnius UniversityAbstract Background Alignment of sequence families described by profiles provides a sensitive means for establishing homology between proteins and is important in protein evolutionary, structural, and functional studies. In the context of a steadily growing amount of sequence data, estimating the statistical significance of alignments, including profile-profile alignments, plays a key role in alignment-based homology search algorithms. Still, it is an open question as to what and whether one type of distribution governs profile-profile alignment score, especially when profile-profile substitution scores involve such terms as secondary structure predictions. Results This study presents a methodology for estimating the statistical significance of this type of alignments. The methodology rests on a new algorithm developed for generating random profiles such that their alignment scores are distributed similarly to those obtained for real unrelated profiles. We show that improvements in statistical accuracy and sensitivity and high-quality alignment rate result from statistically characterizing alignments by establishing the dependence of statistical parameters on various measures associated with both individual and pairwise profile characteristics. Implemented in the COMER software, the proposed methodology yielded an increase of up to 34.2% in the number of true positives and up to 61.8% in the number of high-quality alignments with respect to the previous version of the COMER method. Conclusions The more accurate estimation of statistical significance is implemented in the COMER method, which is now more sensitive and provides an increased rate of high-quality profile-profile alignments. The results of the present study also suggest directions for future research.http://link.springer.com/article/10.1186/s12859-019-2913-3Homology searchProfile-profile alignmentRandom profile modelStatistical significanceProtein structure prediction
collection DOAJ
language English
format Article
sources DOAJ
author Mindaugas Margelevičius
spellingShingle Mindaugas Margelevičius
Estimating statistical significance of local protein profile-profile alignments
BMC Bioinformatics
Homology search
Profile-profile alignment
Random profile model
Statistical significance
Protein structure prediction
author_facet Mindaugas Margelevičius
author_sort Mindaugas Margelevičius
title Estimating statistical significance of local protein profile-profile alignments
title_short Estimating statistical significance of local protein profile-profile alignments
title_full Estimating statistical significance of local protein profile-profile alignments
title_fullStr Estimating statistical significance of local protein profile-profile alignments
title_full_unstemmed Estimating statistical significance of local protein profile-profile alignments
title_sort estimating statistical significance of local protein profile-profile alignments
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2019-08-01
description Abstract Background Alignment of sequence families described by profiles provides a sensitive means for establishing homology between proteins and is important in protein evolutionary, structural, and functional studies. In the context of a steadily growing amount of sequence data, estimating the statistical significance of alignments, including profile-profile alignments, plays a key role in alignment-based homology search algorithms. Still, it is an open question as to what and whether one type of distribution governs profile-profile alignment score, especially when profile-profile substitution scores involve such terms as secondary structure predictions. Results This study presents a methodology for estimating the statistical significance of this type of alignments. The methodology rests on a new algorithm developed for generating random profiles such that their alignment scores are distributed similarly to those obtained for real unrelated profiles. We show that improvements in statistical accuracy and sensitivity and high-quality alignment rate result from statistically characterizing alignments by establishing the dependence of statistical parameters on various measures associated with both individual and pairwise profile characteristics. Implemented in the COMER software, the proposed methodology yielded an increase of up to 34.2% in the number of true positives and up to 61.8% in the number of high-quality alignments with respect to the previous version of the COMER method. Conclusions The more accurate estimation of statistical significance is implemented in the COMER method, which is now more sensitive and provides an increased rate of high-quality profile-profile alignments. The results of the present study also suggest directions for future research.
topic Homology search
Profile-profile alignment
Random profile model
Statistical significance
Protein structure prediction
url http://link.springer.com/article/10.1186/s12859-019-2913-3
work_keys_str_mv AT mindaugasmargelevicius estimatingstatisticalsignificanceoflocalproteinprofileprofilealignments
_version_ 1724575412027129856