A model to predict the function of hypothetical proteins through a nine-point classification scoring schema

Abstract Background Hypothetical proteins [HP] are those that are predicted to be expressed in an organism, but no evidence of their existence is known. In the recent past, annotation and curation efforts have helped overcome the challenge in understanding their diverse functions. Techniques to deci...

Full description

Bibliographic Details
Main Authors: Johny Ijaq, Girik Malik, Anuj Kumar, Partha Sarathi Das, Narendra Meena, Neeraja Bethi, Vijayaraghava Seshadri Sundararajan, Prashanth Suravajhala
Format: Article
Language:English
Published: BMC 2019-01-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-018-2554-y
id doaj-adebf63a550c41b8b83b0b7a101de575
record_format Article
spelling doaj-adebf63a550c41b8b83b0b7a101de5752020-11-25T02:08:02ZengBMCBMC Bioinformatics1471-21052019-01-012011810.1186/s12859-018-2554-yA model to predict the function of hypothetical proteins through a nine-point classification scoring schemaJohny Ijaq0Girik Malik1Anuj Kumar2Partha Sarathi Das3Narendra Meena4Neeraja Bethi5Vijayaraghava Seshadri Sundararajan6Prashanth Suravajhala7Department of Biotechnology, Osmania UniversityDepartment of Pediatrics, The Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children’s Hospital, The Ohio State UniversityBioclues.orgBioclues.orgDepartment of Biotechnology and Bioinformatics, Birla Institute of Scientific ResearchDepartment of Biotechnology, Osmania UniversityBioclues.orgBioclues.orgAbstract Background Hypothetical proteins [HP] are those that are predicted to be expressed in an organism, but no evidence of their existence is known. In the recent past, annotation and curation efforts have helped overcome the challenge in understanding their diverse functions. Techniques to decipher sequence-structure-function relationship, especially in terms of functional modelling of the HPs have been developed by researchers, but using the features as classifiers for HPs has not been attempted. With the rise in number of annotation strategies, next-generation sequencing methods have provided further understanding the functions of HPs. Results In our previous work, we developed a six-point classification scoring schema with annotation pertaining to protein family scores, orthology, protein interaction/association studies, bidirectional best BLAST hits, sorting signals, known databases and visualizers which were used to validate protein interactions. In this study, we introduced three more classifiers to our annotation system, viz. pseudogenes linked to HPs, homology modelling and non-coding RNAs associated to HPs. We discuss the challenges and performance of these classifiers using machine learning heuristics with an improved accuracy from Perceptron (81.08 to 97.67), Naive Bayes (54.05 to 96.67), Decision tree J48 (67.57 to 97.00), and SMO_npolyk (59.46 to 96.67). Conclusion With the introduction of three new classification features, the performance of the nine-point classification scoring schema has an improved accuracy to functionally annotate the HPs.http://link.springer.com/article/10.1186/s12859-018-2554-yHypothetical proteinsMachine learningClassification featuresFunctional genomics
collection DOAJ
language English
format Article
sources DOAJ
author Johny Ijaq
Girik Malik
Anuj Kumar
Partha Sarathi Das
Narendra Meena
Neeraja Bethi
Vijayaraghava Seshadri Sundararajan
Prashanth Suravajhala
spellingShingle Johny Ijaq
Girik Malik
Anuj Kumar
Partha Sarathi Das
Narendra Meena
Neeraja Bethi
Vijayaraghava Seshadri Sundararajan
Prashanth Suravajhala
A model to predict the function of hypothetical proteins through a nine-point classification scoring schema
BMC Bioinformatics
Hypothetical proteins
Machine learning
Classification features
Functional genomics
author_facet Johny Ijaq
Girik Malik
Anuj Kumar
Partha Sarathi Das
Narendra Meena
Neeraja Bethi
Vijayaraghava Seshadri Sundararajan
Prashanth Suravajhala
author_sort Johny Ijaq
title A model to predict the function of hypothetical proteins through a nine-point classification scoring schema
title_short A model to predict the function of hypothetical proteins through a nine-point classification scoring schema
title_full A model to predict the function of hypothetical proteins through a nine-point classification scoring schema
title_fullStr A model to predict the function of hypothetical proteins through a nine-point classification scoring schema
title_full_unstemmed A model to predict the function of hypothetical proteins through a nine-point classification scoring schema
title_sort model to predict the function of hypothetical proteins through a nine-point classification scoring schema
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2019-01-01
description Abstract Background Hypothetical proteins [HP] are those that are predicted to be expressed in an organism, but no evidence of their existence is known. In the recent past, annotation and curation efforts have helped overcome the challenge in understanding their diverse functions. Techniques to decipher sequence-structure-function relationship, especially in terms of functional modelling of the HPs have been developed by researchers, but using the features as classifiers for HPs has not been attempted. With the rise in number of annotation strategies, next-generation sequencing methods have provided further understanding the functions of HPs. Results In our previous work, we developed a six-point classification scoring schema with annotation pertaining to protein family scores, orthology, protein interaction/association studies, bidirectional best BLAST hits, sorting signals, known databases and visualizers which were used to validate protein interactions. In this study, we introduced three more classifiers to our annotation system, viz. pseudogenes linked to HPs, homology modelling and non-coding RNAs associated to HPs. We discuss the challenges and performance of these classifiers using machine learning heuristics with an improved accuracy from Perceptron (81.08 to 97.67), Naive Bayes (54.05 to 96.67), Decision tree J48 (67.57 to 97.00), and SMO_npolyk (59.46 to 96.67). Conclusion With the introduction of three new classification features, the performance of the nine-point classification scoring schema has an improved accuracy to functionally annotate the HPs.
topic Hypothetical proteins
Machine learning
Classification features
Functional genomics
url http://link.springer.com/article/10.1186/s12859-018-2554-y
work_keys_str_mv AT johnyijaq amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT girikmalik amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT anujkumar amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT parthasarathidas amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT narendrameena amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT neerajabethi amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT vijayaraghavaseshadrisundararajan amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT prashanthsuravajhala amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT johnyijaq modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT girikmalik modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT anujkumar modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT parthasarathidas modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT narendrameena modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT neerajabethi modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT vijayaraghavaseshadrisundararajan modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT prashanthsuravajhala modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
_version_ 1724928018959302656