Estimating the Covariance Spectrum

-
Weihao Kong, Stanford University
TBD

Please note special day (Tuesday).   Suppose one wishes to accurately recover the set of eigenvalues of the covariance matrix of a distribution, given access to samples from the distribution. To what extent can this set be accurately approximated given an amount of data that is sublinear in the dimension of the space? The naive empirical estimate has poor performance when the number of samples is linear or sub-linear in the dimension of the data. In the "large sample, large dimension" setting, we proposed an efficient and information theoretically near-optimal algorithm to learn the moments of the covariance spectral distribution. Further we show that given the first k moments of a distribution, we can pin down the distribution in Earthmover distance up to an error of O(1/k). These two results combined allow us to efficiently and accurately learn the spectrum of the underlying covariance matrix, yielding a consistent spectrum estimator even with sub-linear number of samples.