Sublinear-time algorithms for counting star subgraphs with applications to join selectivity estimation

Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 37-43). === We study the problem of estimating the value of sums of the form ... when one has...

Full description

Bibliographic Details
Main Author: Peebles, John Lee Thompson, Jr
Other Authors: Jon Kelner and Ronitt Rubinfeld.
Format: Others
Language:English
Published: Massachusetts Institute of Technology 2016
Subjects:
Online Access:http://hdl.handle.net/1721.1/103746
id ndltd-MIT-oai-dspace.mit.edu-1721.1-103746
record_format oai_dc
spelling ndltd-MIT-oai-dspace.mit.edu-1721.1-1037462019-05-02T16:05:33Z Sublinear-time algorithms for counting star subgraphs with applications to join selectivity estimation Peebles, John Lee Thompson, Jr Jon Kelner and Ronitt Rubinfeld. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. Cataloged from PDF version of thesis. Includes bibliographical references (pages 37-43). We study the problem of estimating the value of sums of the form ... when one has the ability to sample xi ;> 0 with probability proportional to its magnitude. When p = 2, this problem is equivalent to estimating the selectivity of a self-join query in database systems when one can sample rows randomly. We also study the special case when {x} is the degree sequence of a graph, which corresponds to counting the number of p-stars in a graph when one has the ability to sample edges randomly. Our algorithm for a ...-multiplicative approximation of Sp has query and time complexities ... Here, m = ... is the number of edges in the graph, E2 Sp or equivalently, half the number of records in the database table. Similarly, n is the number of vertices in the graph and the number of unique values in the database table. We also provide tight lower bounds (up to polylogarithmic factors) in almost all cases, even when {xi} is a degree sequence and one is allowed to use the structure of the graph to try to get a better estimate. We are not aware of any prior lower bounds on the problem of join selectivity estimation. For the graph problem, prior work which assumed the ability to sample only vertices uniformly gave algorithms with matching lower bounds [Gonen, Ron, and Shavitt. SIAM J. Comput., 25 (2011), pp. 1365-14111. With the ability to sample edges randomly, we show that one can achieve faster algorithms for approximating the number of star subgraphs, bypassing the lower bounds in this prior work. For example, in the regime where ... , our upper bound is ... in contrast to their ... lower bound when no random edge queries are available. In addition, we consider the problem of counting the number of directed paths of length two when the graph is directed. This problem is equivalent to estimating the selectivity of a join query between two distinct tables. We prove that the general version of this problem cannot be solved in sublinear time. However, when the ratio between in-degree and out-degree is bounded-or equivalently, when the ratio between the number of occurrences of values in the two columns being joined is bounded-we give a sublinear time algorithm via a reduction to the undirected case. by John Lee Thompson Peebles, Jr. S.M. 2016-07-18T20:05:56Z 2016-07-18T20:05:56Z 2016 2016 Thesis http://hdl.handle.net/1721.1/103746 953583023 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 60 pages application/pdf Massachusetts Institute of Technology
collection NDLTD
language English
format Others
sources NDLTD
topic Electrical Engineering and Computer Science.
spellingShingle Electrical Engineering and Computer Science.
Peebles, John Lee Thompson, Jr
Sublinear-time algorithms for counting star subgraphs with applications to join selectivity estimation
description Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 37-43). === We study the problem of estimating the value of sums of the form ... when one has the ability to sample xi ;> 0 with probability proportional to its magnitude. When p = 2, this problem is equivalent to estimating the selectivity of a self-join query in database systems when one can sample rows randomly. We also study the special case when {x} is the degree sequence of a graph, which corresponds to counting the number of p-stars in a graph when one has the ability to sample edges randomly. Our algorithm for a ...-multiplicative approximation of Sp has query and time complexities ... Here, m = ... is the number of edges in the graph, E2 Sp or equivalently, half the number of records in the database table. Similarly, n is the number of vertices in the graph and the number of unique values in the database table. We also provide tight lower bounds (up to polylogarithmic factors) in almost all cases, even when {xi} is a degree sequence and one is allowed to use the structure of the graph to try to get a better estimate. We are not aware of any prior lower bounds on the problem of join selectivity estimation. For the graph problem, prior work which assumed the ability to sample only vertices uniformly gave algorithms with matching lower bounds [Gonen, Ron, and Shavitt. SIAM J. Comput., 25 (2011), pp. 1365-14111. With the ability to sample edges randomly, we show that one can achieve faster algorithms for approximating the number of star subgraphs, bypassing the lower bounds in this prior work. For example, in the regime where ... , our upper bound is ... in contrast to their ... lower bound when no random edge queries are available. In addition, we consider the problem of counting the number of directed paths of length two when the graph is directed. This problem is equivalent to estimating the selectivity of a join query between two distinct tables. We prove that the general version of this problem cannot be solved in sublinear time. However, when the ratio between in-degree and out-degree is bounded-or equivalently, when the ratio between the number of occurrences of values in the two columns being joined is bounded-we give a sublinear time algorithm via a reduction to the undirected case. === by John Lee Thompson Peebles, Jr. === S.M.
author2 Jon Kelner and Ronitt Rubinfeld.
author_facet Jon Kelner and Ronitt Rubinfeld.
Peebles, John Lee Thompson, Jr
author Peebles, John Lee Thompson, Jr
author_sort Peebles, John Lee Thompson, Jr
title Sublinear-time algorithms for counting star subgraphs with applications to join selectivity estimation
title_short Sublinear-time algorithms for counting star subgraphs with applications to join selectivity estimation
title_full Sublinear-time algorithms for counting star subgraphs with applications to join selectivity estimation
title_fullStr Sublinear-time algorithms for counting star subgraphs with applications to join selectivity estimation
title_full_unstemmed Sublinear-time algorithms for counting star subgraphs with applications to join selectivity estimation
title_sort sublinear-time algorithms for counting star subgraphs with applications to join selectivity estimation
publisher Massachusetts Institute of Technology
publishDate 2016
url http://hdl.handle.net/1721.1/103746
work_keys_str_mv AT peeblesjohnleethompsonjr sublineartimealgorithmsforcountingstarsubgraphswithapplicationstojoinselectivityestimation
_version_ 1719034146369568768