Genomic data and Privacy

Michael Baudis

ETHZ Lecture

The understanding of the impact of individual inherited and somatic genome variants on phenotypes and diseases requires a thorough understanding of the occurrence of such variants amongst populations in general and carriers of the phenotypes and diseases in particular. This information can only be provided through the inclusion of data from a multitude of genome resources in variant evaluation efforts, including such from outside (international) jurisdictions. However, opening such resources carries the inherent risk of breaching privacy, particularly through re-identification of individuals or their relatives and potentially through the exposure of individual genome-related personal information including phenotypic and “performance” prediction and relative disease risk.

On the other hand, direct-to-consumer (DTC) testing has found widespread use with millions of individuals depositing some kind of genotyping data in open or closed repositories, thereby facilitating “long-range re-identification attacks” against individuals through identification of their relatives. And new DNA analysis techniques enable sequencing without the need for expensive equipment or high-tech laboratories.

However, any risk evaluations regarding potential harm from genomic data (ab)use have to be performed in the context of existing risks and changing attitudes, e.g. the widespread, willing exposure of private and possibly compromising data through social media platforms.

This part of the course will address some of the issues related to the sharing of human genome and associated data, including the “why” and some current “how” concepts such as the “Beacon” protocol and the Global Alliance for Genomics and Health (GA4GH) in general; ways to breach genomic privacy and possible mitigations, the (ab?)use of DTC derived data repositories for “long-range re-identification attacks” by law enforcement agencies - and set this in perspective with the “online social” behaviour of the “humans of the 2020s”.


Edit on Github...