Privacy-Preservation for Publishing Sample Availability Data with Personal Identifiers

Published: 2015-04-30

Formatted citation

Gholami A, Laure E, Somogyi P, Spjuth O, Niazi S, Dowling J. Privacy-Preservation for Publishing Sample Availability Data with Personal Identifiers.
JOMB. 4, 2, 117--125. (2015). DOI: 10.12720/jomb.4.2.117-125

Abstract

Medical organizations collect, store and process vast amounts of sensitive information about patients. Easy access to this information by researchers is crucial to improving medical research, but in many institutions, cumbersome security measures and walled-gardens have created a situation where even information about what medical data is out there is not available. One of the main security challenges in this area, is enabling researchers to cross-link different medical studies, while preserving the privacy of the patients involved. In this paper, we introduce a privacy-preserving system for publishing sample availability data that allows researchers to make queries that crosscut different studies. That is, researchers can ask questions such as how many patients have had both diabetes and prostate cancer, where the diabetes and prostate cancer information originates from different clinical registries. We realize our solution by having a two-level anonymiziation mechanism, where our toolkit for publishing availability data first pseudonymizes personal identifiers and then anonymizes sensitive attributes. Our toolkit also includes a web-based server that stores the encrypted pseudonymized sample data and allows researchers to execute cross-linked queries across different study data. We believe that our toolkit contributes a first step to support the privacy preserving publication of data containing personal identifiers.