Machine learning strategies when transitioning between biological assays

Published: 2021-06-03

Formatted citation

Arvidsson McShane S, Ahlberg E, Noeske T, and Spjuth O.. Machine learning strategies when transitioning between biological assays.
Journal of Chemical Information and Modeling. 61, 7, 3722-3733. (2021). DOI: 10.1021/acs.jcim.1c00293

Abstract

Machine learning is widely used in drug development to predict activity in biological assays based on chemical structure. However, the process of transitioning from one experimental setup to another for the same biological endpoint has not been extensively studied. In a retrospective study, we here explore different modeling strategies of how to combine data from the old and new assays when training conformal prediction models using data from hERG and NaV assays. We suggest to continuously monitor the validity and efficiency of models as more data is accumulated from the new assay and select a modeling strategy based on these metrics. In order to maximize the utility of data from the old assay, we propose a strategy that augments the proper training set of an inductive conformal predictor by adding data from the old assay but only having data from the new assay in the calibration set, which results in valid (well-calibrated) models with improved efficiency compared to other strategies. We study the results for varying sizes of new and old assays, allowing for discussion of different practical scenarios. We also conclude that our proposed assay transition strategy is more beneficial, and the value of data from the new assay is higher, for the harder case of regression compared to classification problems.