Applying negative rule mining to improve genome annotation

Abstract Background Unsupervised annotation of proteins by software pipelines suffers from very high error rates. Spurious functional assignments are usually caused by unwarranted homology-based transfer of information from existing database entries to...

Full description

Bibliographic Details
Main Authors:	Frishman Goar, Artamonova Irena I, Frishman Dmitrij
Format:	Article
Language:	English
Published:	BMC 2007-07-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/8/261

id	doaj-a4de314a78b941f58390e2834610a838
record_format	Article
spelling	doaj-a4de314a78b941f58390e2834610a8382020-11-25T01:01:00ZengBMCBMC Bioinformatics1471-21052007-07-018126110.1186/1471-2105-8-261Applying negative rule mining to improve genome annotationFrishman GoarArtamonova Irena IFrishman Dmitrij<p>Abstract</p> <p>Background</p> <p>Unsupervised annotation of proteins by software pipelines suffers from very high error rates. Spurious functional assignments are usually caused by unwarranted homology-based transfer of information from existing database entries to the new target sequences. We have previously demonstrated that data mining in large sequence annotation databanks can help identify annotation items that are strongly associated with each other, and that exceptions from strong positive association rules often point to potential annotation errors. Here we investigate the applicability of negative association rule mining to revealing erroneously assigned annotation items.</p> <p>Results</p> <p>Almost all exceptions from strong negative association rules are connected to at least one wrong attribute in the feature combination making up the rule. The fraction of annotation features flagged by this approach as suspicious is strongly enriched in errors and constitutes about 0.6% of the whole body of the similarity-transferred annotation in the PEDANT genome database. Positive rule mining does not identify two thirds of these errors. The approach based on exceptions from negative rules is much more specific than positive rule mining, but its coverage is significantly lower.</p> <p>Conclusion</p> <p>Mining of both negative and positive association rules is a potent tool for finding significant trends in protein annotation and flagging doubtful features for further inspection.</p> http://www.biomedcentral.com/1471-2105/8/261
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Frishman Goar Artamonova Irena I Frishman Dmitrij
spellingShingle	Frishman Goar Artamonova Irena I Frishman Dmitrij Applying negative rule mining to improve genome annotation BMC Bioinformatics
author_facet	Frishman Goar Artamonova Irena I Frishman Dmitrij
author_sort	Frishman Goar
title	Applying negative rule mining to improve genome annotation
title_short	Applying negative rule mining to improve genome annotation
title_full	Applying negative rule mining to improve genome annotation
title_fullStr	Applying negative rule mining to improve genome annotation
title_full_unstemmed	Applying negative rule mining to improve genome annotation
title_sort	applying negative rule mining to improve genome annotation
publisher	BMC
series	BMC Bioinformatics
issn	1471-2105
publishDate	2007-07-01
description	<p>Abstract</p> <p>Background</p> <p>Unsupervised annotation of proteins by software pipelines suffers from very high error rates. Spurious functional assignments are usually caused by unwarranted homology-based transfer of information from existing database entries to the new target sequences. We have previously demonstrated that data mining in large sequence annotation databanks can help identify annotation items that are strongly associated with each other, and that exceptions from strong positive association rules often point to potential annotation errors. Here we investigate the applicability of negative association rule mining to revealing erroneously assigned annotation items.</p> <p>Results</p> <p>Almost all exceptions from strong negative association rules are connected to at least one wrong attribute in the feature combination making up the rule. The fraction of annotation features flagged by this approach as suspicious is strongly enriched in errors and constitutes about 0.6% of the whole body of the similarity-transferred annotation in the PEDANT genome database. Positive rule mining does not identify two thirds of these errors. The approach based on exceptions from negative rules is much more specific than positive rule mining, but its coverage is significantly lower.</p> <p>Conclusion</p> <p>Mining of both negative and positive association rules is a potent tool for finding significant trends in protein annotation and flagging doubtful features for further inspection.</p>
url	http://www.biomedcentral.com/1471-2105/8/261
work_keys_str_mv	AT frishmangoar applyingnegativeruleminingtoimprovegenomeannotation AT artamonovairenai applyingnegativeruleminingtoimprovegenomeannotation AT frishmandmitrij applyingnegativeruleminingtoimprovegenomeannotation
_version_	1725211405966114816

Applying negative rule mining to improve genome annotation

Similar Items