Greedy Subspace Clustering

Unions of subspaces 

(Joint work with Constantine Caramanis and Sujay Sanghavi)

Subspace clustering is the problem of fitting a collection of high-dimensional data points to a union of subspaces. This problem can be regarded as ‘‘Mixture of PCA’’ because the points are to be partitioned into subsets each of which is modeled by a low-dimensional subspace. This model naturally arises in settings where data from multiple latent phenomena are mixed together and need to be separated.

Recent algorithms for subspace clustering are based on two steps. [1-3,5,6] At the first step, a number of neighbor points are collected for each data point. Then the points are segmented using spectral clustering. (See the table below.) The State-of-the-art ones solve a convex program with size as large as the squared number of data points. [1-3,7] As the amount of the data increases, the computational cost becomes critical. To reduce the cost, we proposed greedy algorithms.

Our algorithms are provable in the sense that the exactly correct clustering is guaranteed under certain conditions in the standard models. For example, when L d-dimensional subspaces are drawn uniformly at random in mathcal{R}^p, and n random data points are iid uniformly generated on each subspace, exact clustering is guaranteed with high probability if n = Omega(d) and frac{d}{p} = O left( frac{log n}{log (ndL)} right). This condition is no worse than the existing statistical guarantees.

In practice, our greedy algorithm, which is in general significantly faster than solving a convex program, performs competitively against the algorithms on real-world benchmark datasets. We observed that our algorithm outperforms the existing greedy algorithms. [5,6]

For more detailed results in this project, please see our paper.

SSC[1,2] LRR[3] SSC-OMP[5] TSC[6] LRSSC[7] NSN+Spectral NSN+GSR
Step 1. Neighborhood Construction ell_1-min. Nuclear norm min. OMP Thresholding Hybrid ell_1 & Nuclear norm min. NSN NSN
Step 2. Clustering Spectral Clustering GSR

Applications

Subspace clustering has diverse applications such as computer vision(motion segmenation, face clustering, etc), system identification, gene expression analysis.

  • Motion segmentation : Given a collection of feature points extracted from a video sequence, the points of a rigidly moving object lies on a low-dimensional subspace. Based on this observation, we can segment the feature points with respect to the objects.

Motion Segmentation 1   Motion Segmentation 2   Motion Segmentation 3   Motion Segmentation 4  
(reference: Hopkins155 dataset [8])

  • Face clustering : Under varying illumination conditions, the images of a face can be modeled by a low-dimensional subspace. Using subspace clustering algorithms, we can segment the images of different faces with respect to the faces.

Face clustering 1   Face clustering 2   Face clustering 3  
(reference: Extended Yale B dataset [9,10])

Paper

Codes

  • Matlab codes

    • This package contains matlab functions for our algorithm, and the scripts that reproduce the tables and the figures in our paper. To run the scripts, you have to download the matlab codes of the competing algorithms from their corresponding links. Please read README.txt and follow the instructions.

References

  1. E. Elhamifar and R. Vidal. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intelligence, 35(11):2765–2781, 2013.

  2. M. Soltanolkotabi and E. J. Candes. A geometric analysis of subspace clustering with outliers. The Annals of Statistics, 40(4):2195–2238, 2012.

  3. G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma. Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intelligence, 35(1):171–184, 2013.

  4. G. Chen and G. Lerman. Spectral curvature clustering. International Journal of Computer Vision, 81(3):317–330, 2009.

  5. E. L. Dyer, A. C. Sankaranarayanan, and R. G. Baraniuk. Greedy feature selection for subspace clustering. The Journal of Machine Learning Research (JMLR), 14(1):2487–2517, 2013.

  6. R. Heckel and H. Bolcskei. Robust subspace clustering via thresholding. arXiv preprint, arXiv:1307.4891v2, 2014.

  7. Y.-X. Wang and H. Xu. Provable subspace clustering: When LRR meets SSC. In Proceedings of Neural Information Processing Systems (NIPS), 2013.

  8. R. Tron and R. Vidal. A benchmark for the comparison of 3-d motion segmentation algorithms. In IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2007.

  9. A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Mach. Intelligence, 23(6):643–660, 2001.

  10. K. C. Lee, J. Ho, and D. Kriegman. Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans. Pattern Anal. Mach. Intelligence, 27(5):684–698, 2005.