Summary: | <p>Abstract</p> <p>Background</p> <p>The functional annotation of proteins relies on published information concerning their close and remote homologues in sequence databases. Evidence for remote sequence similarity can be further strengthened by a similar biological background of the query sequence and identified database sequences. However, few tools exist so far, that provide a means to include functional information in sequence database searches.</p> <p>Results</p> <p>We present ProFAT, a web-based tool for the functional annotation of protein sequences based on remote sequence similarity. ProFAT combines sensitive sequence database search methods and a fold recognition algorithm with a simple text-mining approach. ProFAT extracts identified hits based on their biological background by keyword-mining of annotations, features and most importantly, literature associated with a sequence entry. A user-provided keyword list enables the user to specifically search for weak, but biologically relevant homologues of an input query. The ProFAT server has been evaluated using the complete set of proteins from three different domain families, including their weak relatives and could correctly identify between 90% and 100% of all domain family members studied in this context. ProFAT has furthermore been applied to a variety of proteins from different cellular contexts and we provide evidence on how ProFAT can help in functional prediction of proteins based on remotely conserved proteins.</p> <p>Conclusion</p> <p>By employing sensitive database search programs as well as exploiting the functional information associated with database sequences, ProFAT can detect remote, but biologically relevant relationships between proteins and will assist researchers in the prediction of protein function based on remote homologies.</p>
|