Large-scale Predictive Modelling in Drug Discovery

This project aims at developing computational methods, tools and predictive models to aid the drug discovery process on large data sets. Methods include ligand-based and structure-based methods such as QSAR (machine learning) and docking, with applications including prediction of drug safety, toxicology, interactions, target profiles and secondary pharmacology. In order to analyze large-scale data we use high-performance computing, cloud computing resources, and data analytics platforms such as Apache Hadoop and Apache Spark. We also use and develop scientific workflow systems such as Luigi and BPipe to automate and streamline analysis. The work is carried out in collaboration with AstraZeneca R&D, Maastricht University NL, and Karolinska Institutet. We aim at making models and tools available from the Bioclipse workbench. We are also founding partners of the OpenTox association ( and associated partner with the consortia OpenPhacts ( and e-nanomapper (

Figure: Data is extracted from various data sources, and we use high performance computing, cloud computing, workflows and big data frameworks to train predictive models which are published in the Bioclipse workbench for easy and user-friendly access with graphical interpretations.