Conformal Prediction in Spark: Large-Scale Machine Learning with Confidence

← Back to publications

Published: 2015-12-07

Formatted citation

Capuccini M, Carlsson L, Norinder U, Spjuth O. Conformal Prediction in Spark: Large-Scale Machine Learning with Confidence.
Proceedings - 2015 2nd IEEE/ACM International Symposium on Big Data Computing, BDC 2015. , 61-67. (2015). DOI: 10.1109/BDC.2015.35

Abstract

Increasing size of datasets is challenging for machine learning, and Big Data frameworks, such as Apache Spark, have shown promise for facilitating model building on distributed resources. Conformal prediction is a mathematical framework that allows to assign valid confidence levels to object-specific predictions. This contrasts to current best-practices where the overall confidence level for predictions on unseen objects is estimated based on previous performance, assuming exchangeability. Here we report a Spark-based distributed implementation of conformal prediction, which introduces valid confidence estimation in predictive modeling for Big Data analytics. Experimental results on two large-scale datasets show the validity and the scalabilty of the method, which is freely available as open source. © 2015 ACM.