Secure computational genomics
Date
2024
Authors
Smajlović, Haris
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Scattered between different biobanks and healthcare providers across multiple countries, biomedical data is extensively used for research purposes. Collaboration and sharing of such data between multiple institutions often provide access to more diverse datasets and a chance to conduct comprehensive studies. However, these collaboration efforts are usually hindered by privacy issues that render the pooling of such data at a centralized database impossible. To enable collaborative studies on top of such datasets, we present two easy-to-use domain-specific frameworks, Sequre and Shechi, for secure, high-performance computing on private, distributed datasets. Our frameworks automatically convert Pythonic code into a secure distributed equivalent using secure multiparty computation (SMC) in Sequre and, for the first time, multiparty homomorphic encryption (MHE) in Shechi to enable efficient distributed computation. They abstract away considerations about the private and distributed aspects of the input data from end users through a familiar Pythonic syntax and by introducing new data types for the efficient handling of distributed data as well as systematic compiler optimizations for cryptographic and distributed computation. We evaluate our framework on a wide range of applications, including complex genomic analysis tasks and statistical analysis of private electronic health records (EHRs). Our results demonstrate Sequre’s and Shechi’s ability to uncover optimizations missed even by expert developers, achieving up to 15× runtime improvements over the prior state-of-the-art solutions and a 40-fold improvement in code expressiveness compared to code manually optimized by experts. Finally, our solution enables the utilization of distributed datasets as a whole to conduct collective studies between non-trusting private data proprietors and, as a result, facilitates data sharing and collaboration efforts in privacy-sensitive fields such as biomedicine.
Description
Keywords
Genomic privacy, Privacy-enhancing technologies, Programming languages