Greedy Subspace Clustering

(Joint work with Constantine Caramanis and Sujay Sanghavi)
Subspace clustering is the problem of fitting a collection of highdimensional data points to a union of subspaces. This problem can be regarded as ‘‘Mixture of PCA’’ because the points are to be partitioned into subsets each of which is modeled by a lowdimensional subspace. This model naturally arises in settings where data from multiple latent phenomena are mixed together and need to be separated.
Recent algorithms for subspace clustering are based on two steps. [13,5,6] At the first step, a number of neighbor points are collected for each data point. Then the points are segmented using spectral clustering. (See the table below.) The Stateoftheart ones solve a convex program with size as large as the squared number of data points. [13,7] As the amount of the data increases, the computational cost becomes critical. To reduce the cost, we proposed greedy algorithms.
Our algorithms are provable in the sense that the exactly correct clustering is guaranteed under certain conditions in the standard models. For example, when dimensional subspaces are drawn uniformly at random in , and random data points are iid uniformly generated on each subspace, exact clustering is guaranteed with high probability if and . This condition is no worse than the existing statistical guarantees.
In practice, our greedy algorithm, which is in general significantly faster than solving a convex program, performs competitively against the algorithms on realworld benchmark datasets. We observed that our algorithm outperforms the existing greedy algorithms. [5,6]
For more detailed results in this project, please see our paper.

 SSC[1,2]  LRR[3]  SSCOMP[5]  TSC[6]  LRSSC[7]  NSN+Spectral  NSN+GSR 
Step 1. Neighborhood Construction  min.  Nuclear norm min.  OMP  Thresholding  Hybrid & Nuclear norm min.  NSN  NSN 
Step 2. Clustering  Spectral Clustering  GSR

Applications
Subspace clustering has diverse applications such as computer vision(motion segmenation, face clustering, etc), system identification, gene expression analysis.
Motion segmentation : Given a collection of feature points extracted from a video sequence, the points of a rigidly moving object lies on a lowdimensional subspace. Based on this observation, we can segment the feature points with respect to the objects.
(reference: Hopkins155 dataset [8])
Face clustering : Under varying illumination conditions, the images of a face can be modeled by a lowdimensional subspace. Using subspace clustering algorithms, we can segment the images of different faces with respect to the faces.
(reference: Extended Yale B dataset [9,10])
Paper
Codes
Matlab codes
This package contains matlab functions for our algorithm, and the scripts that reproduce the tables and the figures in our paper. To run the scripts, you have to download the matlab codes of the competing algorithms from their corresponding links. Please read README.txt and follow the instructions.
References
E. Elhamifar and R. Vidal. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intelligence, 35(11):2765–2781, 2013.
M. Soltanolkotabi and E. J. Candes. A geometric analysis of subspace clustering with outliers. The Annals of Statistics, 40(4):2195–2238, 2012.
G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma. Robust recovery of subspace structures by lowrank representation. IEEE Trans. Pattern Anal. Mach. Intelligence, 35(1):171–184, 2013.
G. Chen and G. Lerman. Spectral curvature clustering. International Journal of Computer Vision, 81(3):317–330, 2009.
E. L. Dyer, A. C. Sankaranarayanan, and R. G. Baraniuk. Greedy feature selection for subspace clustering. The Journal of Machine Learning Research (JMLR), 14(1):2487–2517, 2013.
R. Heckel and H. Bolcskei. Robust subspace clustering via thresholding. arXiv preprint, arXiv:1307.4891v2, 2014.
Y.X. Wang and H. Xu. Provable subspace clustering: When LRR meets SSC. In Proceedings of Neural Information Processing Systems (NIPS), 2013.
R. Tron and R. Vidal. A benchmark for the comparison of 3d motion segmentation algorithms. In IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2007.
A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Mach. Intelligence, 23(6):643–660, 2001.
K. C. Lee, J. Ho, and D. Kriegman. Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans. Pattern Anal. Mach. Intelligence, 27(5):684–698, 2005.
