Jon Ander Novella to speak at DevFest Siberia 2017
24 Aug, 2017
Computational biology is complex not only because of the increasing amounts of data. Additionally, biologists must manage analyses that encompass multiple stages and a great number of tools, all the while maintaining reproducibility of results.
Amongst the variety of available tools to undertake parallel computations, Pachyderm is an open-source workflow-engine and distributed data processing tool that leverages the container ecosystem.
In our study we aimed to enable a scalable and reproducible metabolomics workflow using Pachyderm. To achieve this goal, we deployed it on our Kubernetes cluster running on Openstack backed by a Minio object store and GlusterFS. After testing the solution in the Swedish National Infrastructure for Computing cloud, we showed that it scales well and that it is a great alternative for bioinformatics analyses.
Keywords: Pachyderm, Containers, Kubernetes, Scientific Workflows, Bioinformatics, Distributed Computing
Takeaways: Learn how Pachyderm can be used to enable scalable and reproducible workflows on an Open Source Cloud infrastructure