An ℓp theory of PCA and spectral clustering
An ℓp theory of PCA and spectral clustering
This event is in-person and open only to Princeton University ID holders
We develop an ℓpℓp perturbation theory for a hollowed version of PCA in Hilbert spaces which provably improves upon the vanilla PCA in the presence of heteroscedastic noises. Through a novel ℓpℓp analysis of eigenvectors, we investigate entrywise behaviors of principal component score vectors and show that they can be approximated by linear functionals of the Gram matrix in ℓpℓp norm. For sub-Gaussian mixture models, the choice of pp in the theoretical analysis depends on the signal-to-noise ratio, which further yields optimality guarantees for spectral clustering. For contextual community detection, the ℓpℓp theory leads to a simple spectral algorithm that achieves the information threshold for exact recovery. This provides optimal recovery results for the stochastic block model and Gaussian mixture model as special cases. (Joint work with Emmanuel Abbe and Kaizheng Wang)
Jianqing Fan is Frederick L. Moore Professor, Princeton University. After receiving his Ph.D. from the University of California at Berkeley, he has been appointed as professor at the University of North Carolina at Chapel Hill (1989-2003), the University of California at Los Angeles (1997-2000), and professor at the Princeton University (2003--). He was the past president of the Institute of Mathematical Statistics and International Chinese Statistical Association. He is co-editing the Journal of Business and Economics Statistics and was the co-editor of The Annals of Statistics, Probability Theory and Related Fields, and Journal of Econometrics. His published work on statistics, economics, finance, and computational biology has been recognized by The 2000 COPSS Presidents' Award, The 2007 Morningside Gold Medal of Applied Mathematics, Guggenheim Fellow, P.L. Hsu Prize, Royal Statistical Society Guy Medal in silver, Noether Senior Scholar Award, and election to Academician of Academia Sinica and follows of IMS, ASA, AAAS, and SoFiE.