Measures of co-expression for improved function prediction of long non-coding RNAs
Abstract Background Almost 16,000 human long non-coding RNA (lncRNA) genes have been identified in the GENCODE project. However, the function of most of them remains to be discovered. The function of lncRNAs and other novel genes can be predicted by identifying significantly enriched annotation term...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2018-12-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-018-2546-y |
id |
doaj-c96f54cf3f5647acaec371ff67daa259 |
---|---|
record_format |
Article |
spelling |
doaj-c96f54cf3f5647acaec371ff67daa2592020-11-25T02:13:09ZengBMCBMC Bioinformatics1471-21052018-12-0119111210.1186/s12859-018-2546-yMeasures of co-expression for improved function prediction of long non-coding RNAsRezvan Ehsani0Finn Drabløs1Department of Mathematics, University of ZabolDepartment of Clinical and Molecular Medicine, NTNU - Norwegian University of Science and TechnologyAbstract Background Almost 16,000 human long non-coding RNA (lncRNA) genes have been identified in the GENCODE project. However, the function of most of them remains to be discovered. The function of lncRNAs and other novel genes can be predicted by identifying significantly enriched annotation terms in already annotated genes that are co-expressed with the lncRNAs. However, such approaches are sensitive to the methods that are used to estimate the level of co-expression. Results We have tested and compared two well-known statistical metrics (Pearson and Spearman) and two geometrical metrics (Sobolev and Fisher) for identification of the co-expressed genes, using experimental expression data across 19 normal human tissues. We have also used a benchmarking approach based on semantic similarity to evaluate how well these methods are able to predict annotation terms, using a well-annotated set of protein-coding genes. Conclusion This work shows that geometrical metrics, in particular in combination with the statistical metrics, will predict annotation terms more efficiently than traditional approaches. Tests on selected lncRNAs confirm that it is possible to predict the function of these genes given a reliable set of expression data. The software used for this investigation is freely available.http://link.springer.com/article/10.1186/s12859-018-2546-yFunction predictionGene annotationCo-expressionFisher information metricSobolev metric, semantic similarity |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Rezvan Ehsani Finn Drabløs |
spellingShingle |
Rezvan Ehsani Finn Drabløs Measures of co-expression for improved function prediction of long non-coding RNAs BMC Bioinformatics Function prediction Gene annotation Co-expression Fisher information metric Sobolev metric, semantic similarity |
author_facet |
Rezvan Ehsani Finn Drabløs |
author_sort |
Rezvan Ehsani |
title |
Measures of co-expression for improved function prediction of long non-coding RNAs |
title_short |
Measures of co-expression for improved function prediction of long non-coding RNAs |
title_full |
Measures of co-expression for improved function prediction of long non-coding RNAs |
title_fullStr |
Measures of co-expression for improved function prediction of long non-coding RNAs |
title_full_unstemmed |
Measures of co-expression for improved function prediction of long non-coding RNAs |
title_sort |
measures of co-expression for improved function prediction of long non-coding rnas |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2018-12-01 |
description |
Abstract Background Almost 16,000 human long non-coding RNA (lncRNA) genes have been identified in the GENCODE project. However, the function of most of them remains to be discovered. The function of lncRNAs and other novel genes can be predicted by identifying significantly enriched annotation terms in already annotated genes that are co-expressed with the lncRNAs. However, such approaches are sensitive to the methods that are used to estimate the level of co-expression. Results We have tested and compared two well-known statistical metrics (Pearson and Spearman) and two geometrical metrics (Sobolev and Fisher) for identification of the co-expressed genes, using experimental expression data across 19 normal human tissues. We have also used a benchmarking approach based on semantic similarity to evaluate how well these methods are able to predict annotation terms, using a well-annotated set of protein-coding genes. Conclusion This work shows that geometrical metrics, in particular in combination with the statistical metrics, will predict annotation terms more efficiently than traditional approaches. Tests on selected lncRNAs confirm that it is possible to predict the function of these genes given a reliable set of expression data. The software used for this investigation is freely available. |
topic |
Function prediction Gene annotation Co-expression Fisher information metric Sobolev metric, semantic similarity |
url |
http://link.springer.com/article/10.1186/s12859-018-2546-y |
work_keys_str_mv |
AT rezvanehsani measuresofcoexpressionforimprovedfunctionpredictionoflongnoncodingrnas AT finndrabløs measuresofcoexpressionforimprovedfunctionpredictionoflongnoncodingrnas |
_version_ |
1724905998140833792 |