Conformal Prediction in Spark: Large-Scale Machine Learning with Confidence
Capuccini M, Carlsson L, Norinder U, Spjuth O.
Conformal Prediction in Spark: Large-Scale Machine Learning with Confidence.
Proceedings - 2015 2nd IEEE/ACM International Symposium on Big Data Computing, BDC 2015. , 61-67. (2015). DOI: 10.1109/BDC.2015.35
Increasing size of datasets is challenging for machine learning, and Big Data frameworks, such as Apache Spark, have shown promise for facilitating model building on distributed resources. Conformal prediction is a mathematical framework that allows to assign valid confidence levels to object-specific predictions. This contrasts to current best-practices where the overall confidence level for predictions on unseen objects is estimated based on previous performance, assuming exchangeability. Here we report a Spark-based distributed implementation of conformal prediction, which introduces valid confidence estimation in predictive modeling for Big Data analytics. Experimental results on two large-scale datasets show the validity and the scalabilty of the method, which is freely available as open source. © 2015 ACM.