On protocols and measures for the validation of supervised methods for the inference of biological networks

Networks provide a natural representation of molecular biology knowledge, in particular to model relationships between biological entities such as genes, proteins, drugs, or diseases. Because of the effort, the cost, or the lack of the experiments necessary for the elucidation of these networks, com...

Full description

Bibliographic Details
Main Authors:	Marie eSchrynemackers, Robert eKueffner, Pierre eGeurts
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2013-12-01
Series:	Frontiers in Genetics
Subjects:	Cross-validation Biological network inference supervised learning evaluation protocols ROC curves precision-recall curves
Online Access:	http://journal.frontiersin.org/Journal/10.3389/fgene.2013.00262/full

Description
Summary:	Networks provide a natural representation of molecular biology knowledge, in particular to model relationships between biological entities such as genes, proteins, drugs, or diseases. Because of the effort, the cost, or the lack of the experiments necessary for the elucidation of these networks, computational approaches for network inference have been frequently investigated in the literature.In this paper, we examine the assessment of supervised network inference. Supervised inference is based on machine learning techniques that infer the network from a training sample of known interacting and possibly non-interacting entities and additional measurement data. While these methods are very effective, their reliable validation in silico poses a challenge, since both prediction and validation need to be performed on the basis of the same partially known network. Cross-validation techniques need to be specifically adapted to classification problems on pairs of objects. We perform a critical review and assessment of protocols and measures proposed in the literature and derive specific guidelines how to best exploit and evaluate machine learning techniques for network inference. Through theoretical considerations and in silico experiments, we analyze in depth how important factors influence the outcome of performance estimation. These factors include the amount of information available for the interacting entities, the sparsity and topology of biological networks, and the lack of experimentally verified non-interacting pairs.
ISSN:	1664-8021

On protocols and measures for the validation of supervised methods for the inference of biological networks

Similar Items