Aggregating Predictions on Multiple Non-disclosed Datasets using Conformal Prediction
Published: 2018-06-12
Formatted citation
Spjuth O, Carlsson L., Gauraha N..
Aggregating Predictions on Multiple Non-disclosed Datasets using Conformal Prediction.
arXiv.
1806.04000 (2018).
URL: arxiv.org/abs/1806.04000
Abstract
Conformal Prediction is a machine learning methodology that produces valid prediction regions under mild conditions. In this paper, we explore the application of making predictions over multiple data sources of different sizes without disclosing data between the sources. We propose that each data source applies a transductive conformal predictor independently using the local data, and that the individual predictions are then aggregated to form a combined prediction region. We demonstrate the method on several data sets, and show that the proposed method produces conservatively valid predictions and reduces the variance in the aggregated predictions. We also study the effect that the number of data sources and size of each source has on aggregated predictions, as compared with equally sized sources and pooled data.