Large-scale Predictive Modelling in Drug Discovery

This project aims at developing computational methods, tools and predictive models to aid the drug discovery process on large data sets. Methods include ligand-based and structure-based methods such as QSAR (machine learning) and docking, with applications including prediction of drug safety, toxicology, interactions, target profiles and secondary pharmacology. In order to analyze large-scale data we make use of modern e-infrastructure such as high-performance computing clusters, cloud computing resources, containerized microservice environments such as Kubernetes, and data analytics platforms such as Apache Spark.

Figure: Data is extracted from various data sources, and we use high performance computing, cloud computing, workflows and big data frameworks to train predictive models which are deployed and served in microservice-environments via interoperable APIs and easy-to-use GUIs.

We also use and develop scientific workflow systems such as ScLuigi, SciPipe, and Pachyderm to automate and streamline analysis. The work is carried out in collaboration with AstraZeneca R&D and SweTox. We are strong promotors of open science and try to publish all data and models online.