Measures of co-expression for improved function prediction of long non-coding RNAs

Abstract Background Almost 16,000 human long non-coding RNA (lncRNA) genes have been identified in the GENCODE project. However, the function of most of them remains to be discovered. The function of lncRNAs and other novel genes can be predicted by identifying significantly enriched annotation term...

Full description

Bibliographic Details
Main Authors: Rezvan Ehsani, Finn Drabløs
Format: Article
Language:English
Published: BMC 2018-12-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-018-2546-y
id doaj-c96f54cf3f5647acaec371ff67daa259
record_format Article
spelling doaj-c96f54cf3f5647acaec371ff67daa2592020-11-25T02:13:09ZengBMCBMC Bioinformatics1471-21052018-12-0119111210.1186/s12859-018-2546-yMeasures of co-expression for improved function prediction of long non-coding RNAsRezvan Ehsani0Finn Drabløs1Department of Mathematics, University of ZabolDepartment of Clinical and Molecular Medicine, NTNU - Norwegian University of Science and TechnologyAbstract Background Almost 16,000 human long non-coding RNA (lncRNA) genes have been identified in the GENCODE project. However, the function of most of them remains to be discovered. The function of lncRNAs and other novel genes can be predicted by identifying significantly enriched annotation terms in already annotated genes that are co-expressed with the lncRNAs. However, such approaches are sensitive to the methods that are used to estimate the level of co-expression. Results We have tested and compared two well-known statistical metrics (Pearson and Spearman) and two geometrical metrics (Sobolev and Fisher) for identification of the co-expressed genes, using experimental expression data across 19 normal human tissues. We have also used a benchmarking approach based on semantic similarity to evaluate how well these methods are able to predict annotation terms, using a well-annotated set of protein-coding genes. Conclusion This work shows that geometrical metrics, in particular in combination with the statistical metrics, will predict annotation terms more efficiently than traditional approaches. Tests on selected lncRNAs confirm that it is possible to predict the function of these genes given a reliable set of expression data. The software used for this investigation is freely available.http://link.springer.com/article/10.1186/s12859-018-2546-yFunction predictionGene annotationCo-expressionFisher information metricSobolev metric, semantic similarity
collection DOAJ
language English
format Article
sources DOAJ
author Rezvan Ehsani
Finn Drabløs
spellingShingle Rezvan Ehsani
Finn Drabløs
Measures of co-expression for improved function prediction of long non-coding RNAs
BMC Bioinformatics
Function prediction
Gene annotation
Co-expression
Fisher information metric
Sobolev metric, semantic similarity
author_facet Rezvan Ehsani
Finn Drabløs
author_sort Rezvan Ehsani
title Measures of co-expression for improved function prediction of long non-coding RNAs
title_short Measures of co-expression for improved function prediction of long non-coding RNAs
title_full Measures of co-expression for improved function prediction of long non-coding RNAs
title_fullStr Measures of co-expression for improved function prediction of long non-coding RNAs
title_full_unstemmed Measures of co-expression for improved function prediction of long non-coding RNAs
title_sort measures of co-expression for improved function prediction of long non-coding rnas
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2018-12-01
description Abstract Background Almost 16,000 human long non-coding RNA (lncRNA) genes have been identified in the GENCODE project. However, the function of most of them remains to be discovered. The function of lncRNAs and other novel genes can be predicted by identifying significantly enriched annotation terms in already annotated genes that are co-expressed with the lncRNAs. However, such approaches are sensitive to the methods that are used to estimate the level of co-expression. Results We have tested and compared two well-known statistical metrics (Pearson and Spearman) and two geometrical metrics (Sobolev and Fisher) for identification of the co-expressed genes, using experimental expression data across 19 normal human tissues. We have also used a benchmarking approach based on semantic similarity to evaluate how well these methods are able to predict annotation terms, using a well-annotated set of protein-coding genes. Conclusion This work shows that geometrical metrics, in particular in combination with the statistical metrics, will predict annotation terms more efficiently than traditional approaches. Tests on selected lncRNAs confirm that it is possible to predict the function of these genes given a reliable set of expression data. The software used for this investigation is freely available.
topic Function prediction
Gene annotation
Co-expression
Fisher information metric
Sobolev metric, semantic similarity
url http://link.springer.com/article/10.1186/s12859-018-2546-y
work_keys_str_mv AT rezvanehsani measuresofcoexpressionforimprovedfunctionpredictionoflongnoncodingrnas
AT finndrabløs measuresofcoexpressionforimprovedfunctionpredictionoflongnoncodingrnas
_version_ 1724905998140833792