Unsupervised Ensemble Regression: Making accurate predictions while knowing (almost) nothing
Unsupervised Ensemble Regression: Making accurate predictions while knowing (almost) nothing
In various applications, there is a need to merge the predictions of multiple experts, yet with limited knowledge about their accuracy. Examples include finance - where different analysts provide their 1-year target prices for multiple stocks; biology and medicine - where several research groups each constructs its own algorithm to predict the efficacy of various drugs on different patients, say based on their genetic profiles; and seismology- where the strengths of earthquakes are estimated based on the signals measured at different stations. Simple merging schemes, such as majority voting in classification or the ensemble mean or median in regression, are clearly sub-optimal when some experts are far more accurate than others. In this talk we focus on the regression case, and propose a framework to estimate the mean squared error of the different experts and to combine their predictions to a more accurate meta-learner, all without ground-truth data. Our key assumption is that the deviations of different experts from the optimal predictor are uncorrelated and that the first moment of the response is known.
We show that the covariance of the experts has a particular low rank structure, and derive U-PCR, a novel principal components unsupervised ensemble regression method.
We provide theoretical support for U-PCR and on a variety of regression problems, illustrate its improved accuracy over various unsupervised merging strategies.