Matrix Approximation and Projective Clustering via Volume Sampling: Theory of Computing: An Open Access Electronic Journal in Theoretical Computer Science

Volume 2 (2006) Article 12 pp. 225-247

Matrix Approximation and Projective Clustering via Volume Sampling

by Amit Deshpande, Luis Rademacher, Santosh Vempala, and Grant Wang

Received: August 11, 2005
Revised: September 28, 2006
Published: October 12, 2006
DOI: 10.4086/toc.2006.v002a012

Download article from ToC site:
[PS (493K)] [PS.GZ (123K)] [PS.ZIP (123K)]
[PDF (261K)] [DVI (217K)] [TeX (60K)]

Misc.:
[HTML Bibliography] [Bibliography Source] [BibTeX]
[About the Author(s)]

Keywords: algorithms, matrix approximation, projective clustering

Categories: algorithms, approximation algorithms, random sampling, clustering

ACM Classification: G.1.3, F.2.1

AMS Classification: 68W25, 65F15

Abstract: [Plain Text Version]

Frieze, Kannan, and Vempala (JACM 2004) proved that a small sample of rows of a given matrix A spans the rows of a low-rank approximation D that minimizes
||A--D||_F within a small additive error, and the sampling can be done efficiently using just two passes over the matrix. In this paper, we generalize this result in two ways. First, we prove that the additive error drops exponentially by iterating the sampling in an adaptive manner (adaptive sampling). Using this result, we give a pass-efficient algorithm for computing a low-rank approximation with reduced additive error. Our second result is that there exist k rows of A whose span contains the rows of a multiplicative (k+1)-approximation to the best rank-k matrix; moreover, this subset can be found by sampling k-subsets of rows from a natural distribution (volume sampling). Combining volume sampling with adaptive sampling yields the existence of a set of k+k(k+1)/ε rows whose span contains the rows of a multiplicative (1+ε)-approximation. This leads to a PTAS for the following NP-hard projective clustering problem: Given a set P of points in R^d, and integers j and k, find subspaces F₁, ... , F_j, each of dimension at most k, that minimize ∑_p∈P min_i d(p,F_i)².