HASTE: Hierarchical Analysis of Spatial and TEmporal image data
From intelligent data acquisition via smart data-management to confident predictions
The HASTE project takes a hierarchical approach to acquisition, analysis, and interpretation of image data. We develop computationally efficient measurements for data description, confidence-driven machine learning for determination of interestingness, and a theory and framework to apply intelligent spatial and temporal information hierarchies, distributing data to computational resources and storage options based on low-level image features.
HASTE is a collaboration between the Wählby lab (PI), Hellander lab (co-PI), both at the Department of Information Technology, Uppsala University, the Spjuth lab (co-PI) at the Department of Pharmaceutical Biosciences, Uppsala University, the Nilsson lab at the Department of Biochemistry and Biophysics at Stockholm University and SciLifeLab, Vironova AB and AstraZeneca AB.
Project website: http://haste.research.it.uu.se/
Large-scale Predictive Modelling in Drug Discovery
This project aims at developing computational methods, tools and predictive models to aid the drug discovery process on large data sets. Methods include ligand-based and structure-based methods such as QSAR (machine learning) and docking, with applications including prediction of drug safety, toxicology, interactions, target profiles and secondary pharmacology. In order to analyze large-scale data we use high-performance computing, cloud computing resources, and data analytics platforms such as Apache Hadoop and Apache Spark. We also use and develop scientific workflow systems such as Luigi and BPipe to automate and streamline analysis. The work is carried out in collaboration with AstraZeneca R&D, Maastricht University NL, and Karolinska Institutet. We aim at making models and tools available from the Bioclipse workbench. We are also founding partners of the OpenTox association (www.opentox.org) and associated partner with the consortia OpenPhacts (www.openphacts.org) and e-nanomapper (http://www.enanomapper.net).
Figure: Data is extracted from various data sources, and we use high performance computing, cloud computing, workflows and big data frameworks to train predictive models which are published in the Bioclipse workbench for easy and user-friendly access with graphical interpretations.
Prediction of metabolism
This project aims at developing methods for predicting site-of-metabolism and metabolites based on chemical structure. Using data mining techniques we have developed the tool MetaPrint2D for site-of-metabolism prediction. The project aims at improving these models and also to predict putative metabolites. The work is carried out in close collaboration with AstraZeneca R&D and models and tools are available from the Bioclipse workbench.
Figure: Prediction of site-of-metabolism with the MetaPrint2D method in Bioclipse.
OpenRiskNet EU-H2020 project
OpenRiskNet is a 3-year EU Horizon 2020 project starting on December 1st 2016 that will develop and deploy an integrated, secure, permanent, service driven and sustainable infrastructure for data managing, data sharing, processing, analysis, information mining and modelling as well as workflow development and sharing, visualisation and reporting to serve communities in the areas of toxicology, risk assessment and chemical, pharmaceutical, cosmetic and nanomaterial product development including safe-bydesign aspects at an early stage. This e-infrastructure will support all aspects of risk assessment mentioned above by allowing for the integration of all toxicologyrelated data sources, for the implementation and execution of processing and analysis pipelines.
OpenRiskNet will address the challenges arising from the fragmentation of the data and the insufficient harmonization of user guidance by creating application programming interfaces (APIs) including technical and semantic interoperability layers, containerizing the databases and computational tools, and integrating the microservices into virtual environments (VEs) allowing for deployment of personal and multitenant instances of this flexible, secure and highperformance e-infrastructure.
Ola Spjuth leads WP2: “Interoperability, Deployment and Security”.
PhenoMeNal EU-H2020 project
PhenoMeNal is a 3-year EU Horizon 2020 project starting on September 1st 2015 and will develop a standardised e-infrastructure for analysing medical metabolic phenotype data. This comprises development of standards for data exchange, pipelines, computational frameworks and resources for the processing, analysis and information-mining of the massive amount of medical molecular phenotyping and genotyping data that will be generated by metabolomics applications now entering research and clinic.
Ola Spjuth leads Work Package 5: “Maintenance and Operation of PhenoMeNal grid/cloud e-Infrastructure”.
Project website: http://phenomenal-h2020.eu
Translational bioinformatics is defined as: ”The development of storage, analytic, and interpretive methods to optimize the transformation of increasingly voluminous biomedical data into proactive, predictive, preventative, and participatory health”. Our group carries out research focused on translating massively parallel sequencing via automated bioinformatics analysis, informatics solutions, and reporting systems to aid in clinical settings. Projects include long-read amplicon sequencing of chronic myeloid leukemia (CML), TP53, and multi-drug resistant bacteria. We are also part of the joint SeRC-eSSENCE flagship project “e-Science for Cancer Prevention and Control” (eCPC). Collaborators include the National Genomics Institute (NGI), Uppsala Academic Hospital, and Karolinska Institutet.
Figure: Screenshot from our developed system for translating long-read amplicon sequencing to be used as decision-aid for chronic myeloid leukemia (CML) with mutation frequencies in the Philadelphia chromosome
e-Science for Cancer Prevention and Control
The SeRC flagship project e-Science for Cancer Prevention and Control (eCPC) will set up a modular system for prediction of cancer initiation and progression. It will be based on computational models that integrate data from different sources, including molecular (e.g. genomic, proteomic), environmental and life-style factors. By superimposing screening and prevention strategies on the models, reduced over-treatment, morbidity, mortality and cost can be quantified.
Ola Spjuth leads WP1 (data management and integration) and is also member of the management group.
eCPC Website: http://ecpc.e-science.se