Clustering Affine Subspaces: Algorithms and Hardness

<p>We study a generalization of the famous k-center problem where each object is an affine subspace of dimension Δ, and give either the first or significantly improved algorithms and hardness results for many combinations of parameters. This generalization from points (Δ=0) is motivated by the...

Full description

Bibliographic Details
Main Author: Lee, Euiwoong
Format: Others
Published: 2012
Online Access:https://thesis.library.caltech.edu/7171/1/Thesis2.pdf
Lee, Euiwoong (2012) Clustering Affine Subspaces: Algorithms and Hardness. Master's thesis, California Institute of Technology. doi:10.7907/VF38-NT60. https://resolver.caltech.edu/CaltechTHESIS:07052012-191337554 <https://resolver.caltech.edu/CaltechTHESIS:07052012-191337554>
id ndltd-CALTECH-oai-thesis.library.caltech.edu-7171
record_format oai_dc
spelling ndltd-CALTECH-oai-thesis.library.caltech.edu-71712019-10-04T03:12:16Z Clustering Affine Subspaces: Algorithms and Hardness Lee, Euiwoong <p>We study a generalization of the famous k-center problem where each object is an affine subspace of dimension Δ, and give either the first or significantly improved algorithms and hardness results for many combinations of parameters. This generalization from points (Δ=0) is motivated by the analysis of incomplete data, a pervasive challenge in statistics: incomplete data objects in R<sup>d</sup> can be modeled as affine subspaces. We give three algorithmic results for different values of k, under the assumption that all subspaces are axis-parallel, the main case of interest because of the correspondence to missing entries in data tables.<br /> 1) k=1: Two polynomial time approximation schemes which runs in poly(Δ, 1/ε)nd.<br /> 2) k=2: O(Δ<sup>1/4</sup>)-approximation algorithm which runs in poly(n,d,Δ)<br /> 3) General k: Polynomial time approximation scheme which runs in 2<sup>O(Δk log k(1+1/ε<sup>2</sup>))</sup>nd</p> <p> We also prove nearly matching hardness results; in both the general (not necessarily axis-parallel) case (for k ≥ 2) and in the axis-parallel case (for k ≥ 3), the running time of an approximation algorithm with any approximation ratio cannot be polynomial in even one of k and Δ, unless P = NP. Furthermore, assuming that the 3-SAT problem cannot be solved subexponentially, the dependence on both k and Δ must be exponential in the general case (in the axis-parallel case, only the dependence on k drops to 2<sup>Ω√k)</sup>). The simplicity of the first and the third algorithm suggests that they might be actually used in statistical applications. The second algorithm, which demonstrates a theoretical gap between the axis-parallel and general case for k=2, displays a strong connection between geometric clustering and classical coloring problems on graphs and hypergraphs, via a new Helly-type theorem.</p> 2012 Thesis NonPeerReviewed application/pdf https://thesis.library.caltech.edu/7171/1/Thesis2.pdf https://resolver.caltech.edu/CaltechTHESIS:07052012-191337554 Lee, Euiwoong (2012) Clustering Affine Subspaces: Algorithms and Hardness. Master's thesis, California Institute of Technology. doi:10.7907/VF38-NT60. https://resolver.caltech.edu/CaltechTHESIS:07052012-191337554 <https://resolver.caltech.edu/CaltechTHESIS:07052012-191337554> https://thesis.library.caltech.edu/7171/
collection NDLTD
format Others
sources NDLTD
description <p>We study a generalization of the famous k-center problem where each object is an affine subspace of dimension Δ, and give either the first or significantly improved algorithms and hardness results for many combinations of parameters. This generalization from points (Δ=0) is motivated by the analysis of incomplete data, a pervasive challenge in statistics: incomplete data objects in R<sup>d</sup> can be modeled as affine subspaces. We give three algorithmic results for different values of k, under the assumption that all subspaces are axis-parallel, the main case of interest because of the correspondence to missing entries in data tables.<br /> 1) k=1: Two polynomial time approximation schemes which runs in poly(Δ, 1/ε)nd.<br /> 2) k=2: O(Δ<sup>1/4</sup>)-approximation algorithm which runs in poly(n,d,Δ)<br /> 3) General k: Polynomial time approximation scheme which runs in 2<sup>O(Δk log k(1+1/ε<sup>2</sup>))</sup>nd</p> <p> We also prove nearly matching hardness results; in both the general (not necessarily axis-parallel) case (for k ≥ 2) and in the axis-parallel case (for k ≥ 3), the running time of an approximation algorithm with any approximation ratio cannot be polynomial in even one of k and Δ, unless P = NP. Furthermore, assuming that the 3-SAT problem cannot be solved subexponentially, the dependence on both k and Δ must be exponential in the general case (in the axis-parallel case, only the dependence on k drops to 2<sup>Ω√k)</sup>). The simplicity of the first and the third algorithm suggests that they might be actually used in statistical applications. The second algorithm, which demonstrates a theoretical gap between the axis-parallel and general case for k=2, displays a strong connection between geometric clustering and classical coloring problems on graphs and hypergraphs, via a new Helly-type theorem.</p>
author Lee, Euiwoong
spellingShingle Lee, Euiwoong
Clustering Affine Subspaces: Algorithms and Hardness
author_facet Lee, Euiwoong
author_sort Lee, Euiwoong
title Clustering Affine Subspaces: Algorithms and Hardness
title_short Clustering Affine Subspaces: Algorithms and Hardness
title_full Clustering Affine Subspaces: Algorithms and Hardness
title_fullStr Clustering Affine Subspaces: Algorithms and Hardness
title_full_unstemmed Clustering Affine Subspaces: Algorithms and Hardness
title_sort clustering affine subspaces: algorithms and hardness
publishDate 2012
url https://thesis.library.caltech.edu/7171/1/Thesis2.pdf
Lee, Euiwoong (2012) Clustering Affine Subspaces: Algorithms and Hardness. Master's thesis, California Institute of Technology. doi:10.7907/VF38-NT60. https://resolver.caltech.edu/CaltechTHESIS:07052012-191337554 <https://resolver.caltech.edu/CaltechTHESIS:07052012-191337554>
work_keys_str_mv AT leeeuiwoong clusteringaffinesubspacesalgorithmsandhardness
_version_ 1719259709975822336