Machine Learning for Toxicity Prediction Using Chemical Structure: Pillars for Success in the Real World
Published: 2025-05-02
Formatted citation
Seal S, Mahale M, Garcia-Ortegon M, Joshi C, Hosseini-Gerami L, Beatson A, Greenig M, Shekhar M, Patra A, Weis C, Mehrjou A, Badré A, Paisley B, Lowe R, Singh S, Shah F, Johannesson B, Williams D, Rouquié D, Clevert DA, Schwab P, Richmond N, Nicolaou C, Gonzalez R, Naven R, Schramm C, Vidler L, Mansouri K, Walters WP, Dalmas Wilk D, Spjuth O, Carpenter AE, and Bender A..
Machine Learning for Toxicity Prediction Using Chemical Structure: Pillars for Success in the Real World.
Chemical Research in Toxicology.
(2025).
DOI: 10.1021/acs.chemrestox.5c00033
Abstract
Machine learning (ML) is increasingly valuable for predicting molecular properties and toxicity in drug discovery. However, toxicity-related end points have always been challenging to evaluate experimentally with respect to in vivo translation due to the required resources for human and animal studies; this has impacted data availability in the field. ML can augment or even potentially replace traditional experimental processes depending on the project phase and specific goals of the prediction. For instance, models can be used to select promising compounds for on-target effects or to deselect those with undesirable characteristics (e.g., off-target or ineffective due to unfavorable pharmacokinetics). However, reliance on ML is not without risks, due to biases stemming from nonrepresentative training data, incompatible choice of algorithm to represent the underlying data, or poor model building and validation approaches. This might lead to inaccurate predictions, misinterpretation of the confidence in ML predictions, and ultimately suboptimal decision-making. Hence, understanding the predictive validity of ML models is of utmost importance to enable faster drug development timelines while improving the quality of decisions. This perspective emphasizes the need to enhance the understanding and application of machine learning models in drug discovery, focusing on well-defined data sets for toxicity prediction based on small molecule structures. We focus on five crucial pillars for success with ML-driven molecular property and toxicity prediction: (1) data set selection, (2) structural representations, (3) model algorithm, (4) model validation, and (5) translation of predictions to decision-making. Understanding these key pillars will foster collaboration and coordination between ML researchers and toxicologists, which will help to advance drug discovery and development.