Jon Ander Novella to speak at DevFest Siberia 2017

24 Aug, 2017

Jon Ander Novella will present at DevFest Siberia 2017, on the topic of “Scalable and reproducible bioinformatics workflows with Pachyderm”.

Abstract

Computational biology is complex not only because of the increasing amounts of data. Additionally, biologists must manage analyses that encompass multiple stages and a great number of tools, all the while maintaining reproducibility of results.

Amongst the variety of available tools to undertake parallel computations, Pachyderm is an open-source workflow-engine and distributed data processing tool that leverages the container ecosystem.

In our study we aimed to enable a scalable and reproducible metabolomics workflow using Pachyderm. To achieve this goal, we deployed it on our Kubernetes cluster running on Openstack backed by a Minio object store and GlusterFS. After testing the solution in the Swedish National Infrastructure for Computing cloud, we showed that it scales well and that it is a great alternative for bioinformatics analyses.

Keywords: Pachyderm, Containers, Kubernetes, Scientific Workflows, Bioinformatics, Distributed Computing

Takeaways: Learn how Pachyderm can be used to enable scalable and reproducible workflows on an Open Source Cloud infrastructure