Conformal Prediction in Spark: Large-Scale Machine Learning with Confidence

← Back to publications

Published: 2015-12-07

Formatted citation

Capuccini M, Carlsson L, Norinder U, Spjuth O. Conformal Prediction in Spark: Large-Scale Machine Learning with Confidence.
Proceedings - 2015 2nd IEEE/ACM International Symposium on Big Data Computing, BDC 2015. , 61-67. (2015). DOI: 10.1109/BDC.2015.35


Increasing size of datasets is challenging for machine learning, and Big Data frameworks, such as Apache Spark, have shown promise for facilitating model building on distributed resources. Conformal prediction is a mathematical framework that allows to assign valid confidence levels to object-specific predictions. This contrasts to current best-practices where the overall confidence level for predictions on unseen objects is estimated based on previous performance, assuming exchangeability. Here we report a Spark-based distributed implementation of conformal prediction, which introduces valid confidence estimation in predictive modeling for Big Data analytics. Experimental results on two large-scale datasets show the validity and the scalabilty of the method, which is freely available as open source. © 2015 ACM.