Abstract: | Many organizations in the pharmaceutical industry have a continuous flow of new experimental data from in vitro assays, and place high value in providing updated predictive models trained on the latest data to their scientists. Such model building pipelines are relatively difficult to automate and there is also a need to manage the entire modeling life cycle from model building, testing, and deployment to updates and re-deployment. Further, data sets in toxicology and pharmacology have recently increased greatly in size due to high-throughput molecular technologies and large data resources made publicly available by international consortia. Analyzing such large data in many cases requires access to e-infrastructures such as high-performance and cloud computing. This presentation will highlight the challenges and opportunities for working with e-Science infrastructures, and showcase some of our latest developments for large-scale predictive modeling in toxicology and pharmacology including scientific workflows, cloud computing and the Apache Spark framework. |