Understanding the limitations of network online learning

Abstract Studies of networked phenomena, such as interactions in online social media, often rely on incomplete data, either because these phenomena are partially observed, or because the data is too large or expensive to acquire all at once. Analysis of incomplete data leads to skewed or misleading...

Full description

Bibliographic Details
Main Authors: Timothy LaRock, Timothy Sakharov, Sahely Bhadra, Tina Eliassi-Rad
Format: Article
Language:English
Published: SpringerOpen 2020-09-01
Series:Applied Network Science
Subjects:
Online Access:http://link.springer.com/article/10.1007/s41109-020-00296-w
id doaj-9074238bf59f47f6a7a59fc45b0d0716
record_format Article
spelling doaj-9074238bf59f47f6a7a59fc45b0d07162020-11-25T02:49:29ZengSpringerOpenApplied Network Science2364-82282020-09-015112510.1007/s41109-020-00296-wUnderstanding the limitations of network online learningTimothy LaRock0Timothy Sakharov1Sahely Bhadra2Tina Eliassi-Rad3Network Science Institute, Northeastern University, 360 Huntington AveNetwork Science Institute, Northeastern University, 360 Huntington AveDepartment of Computer Science and Engineering, Indian Institute of Technology PalakkadNetwork Science Institute, Northeastern University, 360 Huntington AveAbstract Studies of networked phenomena, such as interactions in online social media, often rely on incomplete data, either because these phenomena are partially observed, or because the data is too large or expensive to acquire all at once. Analysis of incomplete data leads to skewed or misleading results. In this paper, we investigate limitations of learning to complete partially observed networks via node querying. Concretely, we study the following problem: given (i) a partially observed network, (ii) the ability to query nodes for their connections (e.g., by accessing an API), and (iii) a budget on the number of such queries, sequentially learn which nodes to query in order to maximally increase observability. We call this querying process Network Online Learning and present a family of algorithms called NOL*. These algorithms learn to choose which partially observed node to query next based on a parameterized model that is trained online through a process of exploration and exploitation. Extensive experiments on both synthetic and real world networks show that (i) it is possible to sequentially learn to choose which nodes are best to query in a network and (ii) some macroscopic properties of networks, such as the degree distribution and modular structure, impact the potential for learning and the optimal amount of random exploration.http://link.springer.com/article/10.1007/s41109-020-00296-wPartially observed networksOnline learningHeavy-tailed target distributions
collection DOAJ
language English
format Article
sources DOAJ
author Timothy LaRock
Timothy Sakharov
Sahely Bhadra
Tina Eliassi-Rad
spellingShingle Timothy LaRock
Timothy Sakharov
Sahely Bhadra
Tina Eliassi-Rad
Understanding the limitations of network online learning
Applied Network Science
Partially observed networks
Online learning
Heavy-tailed target distributions
author_facet Timothy LaRock
Timothy Sakharov
Sahely Bhadra
Tina Eliassi-Rad
author_sort Timothy LaRock
title Understanding the limitations of network online learning
title_short Understanding the limitations of network online learning
title_full Understanding the limitations of network online learning
title_fullStr Understanding the limitations of network online learning
title_full_unstemmed Understanding the limitations of network online learning
title_sort understanding the limitations of network online learning
publisher SpringerOpen
series Applied Network Science
issn 2364-8228
publishDate 2020-09-01
description Abstract Studies of networked phenomena, such as interactions in online social media, often rely on incomplete data, either because these phenomena are partially observed, or because the data is too large or expensive to acquire all at once. Analysis of incomplete data leads to skewed or misleading results. In this paper, we investigate limitations of learning to complete partially observed networks via node querying. Concretely, we study the following problem: given (i) a partially observed network, (ii) the ability to query nodes for their connections (e.g., by accessing an API), and (iii) a budget on the number of such queries, sequentially learn which nodes to query in order to maximally increase observability. We call this querying process Network Online Learning and present a family of algorithms called NOL*. These algorithms learn to choose which partially observed node to query next based on a parameterized model that is trained online through a process of exploration and exploitation. Extensive experiments on both synthetic and real world networks show that (i) it is possible to sequentially learn to choose which nodes are best to query in a network and (ii) some macroscopic properties of networks, such as the degree distribution and modular structure, impact the potential for learning and the optimal amount of random exploration.
topic Partially observed networks
Online learning
Heavy-tailed target distributions
url http://link.springer.com/article/10.1007/s41109-020-00296-w
work_keys_str_mv AT timothylarock understandingthelimitationsofnetworkonlinelearning
AT timothysakharov understandingthelimitationsofnetworkonlinelearning
AT sahelybhadra understandingthelimitationsofnetworkonlinelearning
AT tinaeliassirad understandingthelimitationsofnetworkonlinelearning
_version_ 1724743176165523456