The Mathematical Theory of Neural Network-based Machine Learning
The Mathematical Theory of Neural Network-based Machine Learning
The task of supervised learning is to approximate a function using a given set of data. In low dimensions, its mathematical theory has been established in classical numerical analysis and approximation theory in which the function spaces of interest (the Sobolev or Besov spaces), the order of the error and the convergence rate of the gradient-based algorithms are all well-understood. Direct extension of such a theory to high dimensions leads to estimates that suffer from the curse of dimensionality as well as degeneracy in the over-parametrized regime.
In this talk, we attempt to put forward a unified mathematical framework for analyzing neural network-based machine learning in high dimension (and the over-parametrized regime). We illustrate this framework using kernel methods, shallow network models and deep network models. For each of these methods, we identify the right function spaces (for which the optimal complexity estimates and direct and inverse approximation theorems hold), prove optimal a priori generalization error estimates and study the behavior of gradient decent dynamics.
The talk is based mostly on joint work with Chao Ma, Lei Wu as well as Qingcan Wang.