NDCP: Non-Disclosed Conformal Prediction
This is an implementation for ligand-based modeling of the method described in:
Aggregating Predictions on Multiple Non-disclosed Datasets using Conformal Prediction
Ola Spjuth, Lars Carlsson, Niharika Gauraha
It is a common objective for different organizations to be able to contribute to improved predictive models, such as in pre-competitive alliances, but there is often problems with sharing internal data between the parties. One example is in the pharmaceutical industry, where predictions on existing data (such as measured assays of various hazard endpoints for chemical compounds) constitute valuable assets and it would be desirable for all companies to have as good predictive models as possible; but sharing data between companies is usually not so easy.
The idea of the method is that the different parties train an individual model of their local data, and make this available as a web service accepting a SMILES as input, and delivers results in form of p-values (one per each class in the classification case), and a prediction interval (in the regression case). No other information is transferred over the network, making the implementation preserve privacy completely. The results from the contributing parties is then merged as described in Spjuth et al 2018 (https://arxiv.org/abs/1806.04000) to yield predictions with lower variance and improved efficiency.
NDCP is available as two docker container images. One docker image (‘Deploy model’) is deployed on all sites that will serve predictions (without disclosing data); there is no limit to how many different instances that can be deployed. The second docker image (‘Prediction server”) contains a server that aggregates p-values and makes available a web page that can be used to consume the service.
Docker images can be downloaded from:
The “model Docker” builds a model from a file with molecules (accepts .sdf or tab separated .tsv with SMILES and property).
Build and deploy model
docker load < deploy-model.tar.gz
Create a data folder
mkdir <data dir> and copy SD-file and cpsign-license into the folder.
Some CPSign specific settings are needed:
||Part of training set to use for calibration|
||Name of property to model|
||Values of property to model|
||Specify that we are doing classification|
||Number of models to build|
||Name of the model that is built|
These parameters are added at the end when building, see below:
docker run -d -p 8080:8080 -v <absolute data dir>:/build/source <image name:tag> \ --calibration-ratio 0.20 --nr-models 1 --labels "A,N" --logfile /build/full.log \ --cptype 1 --response-name activity --model-name "TestMod"
Deploy webpage to view aggregation
docker load < ndcp-predict.tar.gz docker run -p 8080:8083
A running example is deployed at http://ndcp.service.pharmb.io
The implementation uses CPSign (http://cpsign-docs.genettasoft.com/) as the underlying modeling method with SVM and Signatures. A license is required but these are generously provided by GenettaSoft.