Skip to content

Welcome to the baudisgroup Pages

The baudisgroup website represents projects and information by the Computational Oncogenomics Group of the University of Zurich (UZH) and the Swiss Institute of Bioinformatics (SIB). For visitors more interested in Particle Astrophysics, we strongly recommend the website of another, although related, Professor Baudis.

The Computational Oncogenomics Group's research focus lies in the exploration of structural genome variations in cancer. Our work centres around our Progenetix resource of curated molecular-cytogenetic and sequencing data. Specific projects explore computational methods, genomics of selected tumour entities and genomic variant patterns across malignancies. As members of the Global Alliance for Genomics and Health, the group is developing standards in biocuration and data sharing for genomic variants and phenotypic data, for instance in driving development of the ELIXIR Beacon project. Other research is related to genome data epistemology, e.g. geographic and diagnostic sampling biases in cancer studies.

Latest News & Publications

Upcoming: GA4GH Connect April 2024 in Ascona

Spring 2024 GA4GH Connect working meeting co-organized by our group

elixir logo We're proud to host the next Spring GA4GH Connect meeting in April 2024 at the Congressi Stefano Franscini on the Monte Verità in Ascona. This will provide an excellent opportunity for Swiss genomics and bioinformatics to, well, connect with the international "genomics and health" community and projects.

Continue reading

Upcoming: Federated genomic discoveries: Deploying the GA4GH Beacon protocol

Virtual Seminar
GHGA Lecture Series

GHGA logo With the ever increasing amount of genomic data produced in the context of research studies, population analyses and medical diagnostics the need for access to genomic information beyond administrative or geographic boundaries has become a matter of eminent importance. Continue reading

Upcoming: Genomic Data Sharing Standard Development with GA4GH and ELIXIR
Opportunities and Pitfalls in Federated Data Discovery

DMLS Lecture Series
University of Zurich Department of Molecular Life Sciences

UZH logo In this presentation Michael talks about the role of GA4GH (and ELIXIR) in the development of standards and practices or genomic data exchange, some general principles, how his group got involved into these efforts - but also some pitfalls ...

Continue reading

Structural Genome Variations in Cancer and the Case for Open Data Standards

Cancer Genomics Seminar at Utrecht
Hubrecht Institute and Princess Maxima Center for Pediatric Oncology

UMC logo Princess Maxima logo The presentation includes notes about work on improving the representation and of genomic copy number variations (CNV), GA4GH and its Beacon protocol as well as challenges towards genomic data privacy.

Continue reading

labelSeg: segment annotation for tumor copy number alteration profiles

A tool to assign relative SCNA levels to segments

Hangjia Zhao and Michael Baudis

Briefings in Bioinformatics (Oxford). 2024 Jan 31;2024:bbad541.

Abstract Somatic copy number alterations (SCNAs) are a predominant type of oncogenomic alterations that affect a large proportion of the genome in the majority of cancer samples. Current technologies allow high-throughput measurement of such copy number aberrations, generating results consisting of frequently large sets of SCNA segments. However, the automated annotation and integration of such data are particularly challenging because the measured signals reflect biased, relative copy number ratios. In this study, we introduce labelSeg, an algorithm designed for rapid and accurate annotation of CNA segments, with the aim of enhancing the interpretation of tumor SCNA profiles. Continue reading

Beaconize this: Databases for Cancer Genomics and the Development of Open Data Standards

Seminar at the Bioinformatics club of the Centre de Recherche des Cordeliers (CRC)
Université Paris Cité

In this seminar at the Centre de Recherche des Cordeliers in Paris Michael presents the work of the group, with special emphasis on the role of the Progenetix oncogenomics resources and tools in the development, implementation and testing of the Beacon standard of the Global Alliance for Genomics and Health (GA4GH).

Continue reading

pgxRpi Accepted by Bioconductor

The R wrapper for Peogenetix API pgxRpi is now part of the 'devel' branch and will be included in the upcoming 3.19 release in mid-April 2024.

Continue reading

CNV Project at biohackathon23

Participating at #BioHackEU23 in Barcelona with a CNV reference resource project

biohackathon logo image With other members of the hCNV community some of us will participate at this year's Biohackathon Europe event. The main project will address the creation of the template for a "beaconized" public resource for reference (i.e. not disease associated) copy number variation data, including the necessary tooling for the import from e.g. VCF or BED file variants into Beacon backends (such as our bycon environment).

Continue reading

Swiss-Korean Life Science Symposium

The 10th Swiss-Korean Life Science Symposium in Seoul

Swiss-Korean logo image As representative of the Swiss delegation and particularly of the University of Zurich UZH Michael will be presenter and panel discussion participant at the 10th Swiss-Korean Life Science Symposium in Seoul, together with members of the Swiss and Korean life sciences and personalized health academic and industrial communities.

Continue reading

Progenetix as SIB and ELIXIR Resource

Recognizing the Progenetix platform as Swiss contribution to the European bioinformatics resources ecosystem

elixir logo The Progenetix resource has finally been recognized as an official contribution to the ELIXIR European bioinformatics informatics ecosystem. Besides Expasy Progenetix now is linked through ELIXIR's resource page. Or just go directly to progenetix.org (and its daughter project cancercelllines.org).

Continue reading

Data-Driven Information Extraction and Enrichment of Molecular Profiling Data for Cancer Cell Lines

Literature-derived annotations as entry point for data exploration

Ellery Smith, Rahel Paloots, Dimitris Giagkos, Michael Baudis and Kurt Stockinger

doi: https://doi.org/10.48550/arXiv.2307.00933

arXiv logo Motivation: With the proliferation of research means and computational methodologies, published biomedical literature is growing exponentially in numbers and volume (Lubowitz et al., 2021). As a consequence, in the fields of biological, medical and clinical research, domain experts have to sift through massive amounts of scientific text to find relevant information. However, this process is extremely tedious and slow to be performed by humans. Hence, novel computational information extraction and correlation mechanisms are required to boost meaningful knowledge extraction. Results: In this work, we present the design, implementation and application of a novel data extraction and exploration system. This system extracts deep semantic relations between textual entities from scientific literature to enrich existing structured clinical data in the domain of cancer cell lines. We introduce a new public data exploration portal, which enables automatic linking of genomic copy number variants plots with ranked, related entities such as affected genes. Each relation is accompanied by literature-derived evidences, allowing for deep, yet rapid, literature search, using existing structured data as a springboard.

Availability and Implementation: Our system is publicly available on the web at cancercelllines.org.

Contact: The authors can be contacted at ellery.smith@zhaw.ch or rahel.paloots@uzh.ch.

Continue reading

Short tandem repeat mutations regulate gene expression in colorectal cancer

Exploring STR patterns and their relation to expression changes in cancer

Max A Verbiest, Oxana Lundström, Feifei Xia, Michael Baudis, Tugce Bilgin Sonay, Maria Anisimova

doi: https://doi.org/10.1101/2023.11.29.569189

biorXiv logo Short tandem repeat (STR) mutations are prevalent in colorectal cancer (CRC), especially in tumours with the microsatellite instability (MSI) phenotype. While STR length variations are known to regulate gene expression under physiological conditions, the functional impact of STR mutations in CRC remains unclear. Here, we integrate STR mutation data with clinical information and gene expression levels to study the gene regulatory effects of STR mutations in CRC. We confirm that STR mutability in CRC highly depends on the MSI status, repeat unit size, and repeat length. Furthermore, we present a set of 1244 putative expression STRs (eSTRs) for which the STR length is associated with gene expression levels in CRC tumours. The length of 73 eSTRs is associated with expression levels of cancer-related genes, nine of which are CRC-specific genes. We show that linear models describing eSTR-gene expression relationships allow for predictions of gene expression changes in response to eSTR mutations. Moreover, we found an increased mutability of eSTRs in MSI tumours. Our evidence of gene regulatory roles for eSTRs in CRC highlights a mostly overlooked way through which tumours may modulate their phenotypes. The increased mutability of eSTRs in MSI tumours may be an early indication that eSTR mutations can confer a selective advantage to tumours. Future extensions of our findings into larger cohorts could uncover new STR-based targets in the treatment of cancer.

Continue reading

ELIXIR All Hands Dublin

Baudisgroup presentations at the AHM 2023 in Dublin

Rahel, Hangjia & Michael for the group

At the ELIXIR All Hands Meeting 2023 in Dublin our group presented several posters about our resources and work in standards development.

Continue reading

Phenopacket-tools: Building and validating GA4GH Phenopackets

Bioinformatics tools and examples for working with the Phenopackets standard
Danis D, Jacobsen JOB, Wagner AH, Groza T, Beckwith MA, Rekerle L, Carmody LC, Reese J, Hegde H, Ladewig MS, Seitz B, Munoz-Torres M, Harris NL, Rambla J, Baudis M, Mungall CJ, Haendel MA, Robinson PN. (2023) Phenopacket-tools: Building and validating GA4GH Phenopackets. PLoS One. 18:e0285433.

Abstract The Global Alliance for Genomics and Health (GA4GH) is a standards-setting organization that is developing a suite of coordinated standards for genomics. The GA4GH Phenopacket Schema is a standard for sharing disease and phenotype information that characterizes an individual person or biosample. The Phenopacket Schema is flexible and can represent clinical data for any kind of human disease including rare disease, complex disease, and cancer. It also allows consortia or databases to apply additional constraints to ensure uniform data collection for specific goals. We present phenopacket-tools, an open-source Java library and command-line application for construction, conversion, and validation of phenopackets. Phenopacket-tools simplifies construction of phenopackets by providing concise builders, programmatic shortcuts, and predefined building blocks (ontology classes) for concepts such as anatomical organs, age of onset, biospecimen type, and clinical modifiers. Continue reading

Theoretical Cytogenetics and Oncogenomics

DMLS Tandem Talks

Michael Baudis

In this short presentation Michael provides an overview of the group's work in cancer genomics resources, data analysis and standard development, including the involvement in the Global Alliance for Genomics and Health GA4GH.

  • Cancer Genome Profiles
  • Oncogenomic Data Resources
  • Bioinformatics Methods
  • Data Exchange Standards for Genomics and Personalized Health
Continue reading

ZHAW Visitors for Cancercelllines Hackathon

Implementing INODE-driven literature collections

Fore some months our group has worked with members of Kurt Stockinger's team from the ZHAW on a cancer cell lines use case for the INODE project. In the last 2 days we had a site visit for a first implementation of the use case specific system on cancercelllines.org. More information to follow - and thanks to Ellery & Dimitris for the great work!

Continue reading

Candidate targets of copy number deletion events across 17 cancer types

Identifying cancer related genes against the background of somatic CNV events

Huang Q and Baudis M

doi: 10.3389/fgene.2022.1017657
previous bioRxiv (first )2022-06-29), doi.org/10.1101/2022.06.29.498080

Abstract Genome variation is the direct cause of cancer and driver of its clonal evolution. While the impact of many point mutations can be evaluated through their modification of individual genomic elements, even a single copy number aberration (CNA) may encompass hundreds of genes and therefore pose challenges to untangle potentially complex functional effects. However, consistent, recurring and disease-specific patterns in the genome-wide CNA landscape imply that particular CNA may promote cancer-type-specific characteristics. Discerning essential cancer-promoting alterations from the inherent co-dependency in CNA would improve the understanding of mechanisms of CNA and provide new insights into cancer biology and potential therapeutic targets. Continue reading

Genomic Resource Built with GA4GH Standards

EORTC PAMM Firenze

Michael Baudis

This brief presentation introduces the Progenetix resource, the Gobal Alliance for Genomics and Health as a developers of standards for data sharing in biomedical genomics as well as the use of Progenetix in GA4GH standards development.

Continue reading

Beacon v2 - Onboarding Strategies & Feature Examples

Beacon Sessions at GA4GH Connect

Michael Baudis

The Beacon Sessions at GA4GH Connect November 2022 targeted the migration of existing and implementation of new v2 Beacons, with emphasis on the "how to get there easily" rather than on all Beacon v2 features. Continue reading

Genomics Data Federation through Global Alliance for Genomics and Health Standards: Development and Implementation of the GA4GH Beacon Protocol

Seminar Yonsei University Medical School Seoul

Michael Baudis

In this Seoul meeting presentation Michael introduces the Global Alliance for Genomics and Healt and its involvement in Genomics standards development, followed by a discussion of the Beacon protocol and the role of the Progenetix resouce in its development. Continue reading

Beacon v2 - Feature-rich Implementation of the Genomic Data Discovery Protocol

GA4GH 2022 Plenary Barcelona

Michael Baudis

The “Beacon” protocol - developed with support from ELIXIR, the European bioinformatics infrastructure organization, as a standard of the Global Alliance for Genomics and Health (GA4GH) - represents an emerging standard for an “Internet for Genomics”. While the initial version of the protocol served as a widely adopted test bed for the sharing of genomic variants over federated query systems connecting hundreds of internationally distributed resources, the version 2 of the protocol provides a framework for extended, metadata-rich query and response options in both public and restricted federated access scenarios. Continue reading

GA4GH Phenopackets: A Practical Introduction

Phenopackets v2 introduction with practical examples

Ladewig MS, Jacobsen JO, Wagner AH, Danis D, Kassaby BE, Gargano M, Groza T, Baudis M, Steinhaus R, Seelow D, Bechrakis NE, Mungall CJ, Schofield PN, Elemento O, Smith L, McMurry JA, Munoz-Torres M, Haendel MA and Robinson PN

Abstract The Global Alliance for Genomics and Health (GA4GH) is developing a suite of coordinated standards for genomics for healthcare. The Phenopacket is a new GA4GH standard for sharing disease and phenotype information that characterizes an individual person, linking that individual to detailed phenotypic descriptions, genetic information, diagnoses, and treatments. A detailed example is presented that illustrates how to use the schema to represent the clinical course of a patient with retinoblastoma, including demographic information, the clinical diagnosis, phenotypic features and clinical measurements, an examination of the extirpated tumor, therapies, and the results of genomic analysis. The Phenopacket Schema, together with other GA4GH data and technical standards, will enable data exchange and provide a foundation for the computational analysis of disease and phenotype information to improve our ability to diagnose and conduct research on all types of disorders, including cancer and rare diseases.

The Phenopacket software is available at github.com/phenopackets/.

Continue reading

Beacon - Ethical & Legal Aspects of a Genomic Data Discovery Protocol

DSI Ethics Project Pitch

Michael Baudis

Here Michael provides a very brief presentation about the GA4GH Beacon protocol, especially as a target for projects discussing the ethical implications of genome data discovery & sharing as well as the relevant legal frameworks, with emphasis on the Swiss context. Continue reading

A cancer genomics resource built on GA4GH standards

Rahel Paloots, Michael Baudis

CGC St Louis 2022

Progenetix is a cancer genomics resource that includes genomic profiling data as well as biomedical annotations and provenance data for cancer studies. The main goal of the Progenetix database is to provide easy, open access for research studies and clinical diagnostics. To facilitate sharing of genomic data, Progenetix complies with and contributes to GA4GH and Beacon data standards. Beacon, developed with the support from ELXIR (the European bioinformatics infrastructure organization), started out as protocol to share genomic variants over federated queries.

Continue reading

A cancer genomics reference resource powered by GA4GH standards

Roche Data Science Seminar

Michael Baudis

The presentation reports about the Progenetix cancer genomics resource and its role in the GA4GH ecosystem & the Beacon genomics API development process.

Continue reading

The GA4GH Phenopacket schema defines a computable representation of clinical data

Phenopackets v2 publication

Cell Genomics logo

Jacobsen JOB, Baudis M, Baynam GS, Beckmann JS, Beltran S, Buske OJ, Callahan TJ, Chute CG, Courtot M, Danis D, Elemento O, Essenwanger A, Freimuth RR, ... , Haendel MA, Robinson PN, The GAGHPMC.

Abstract Despite great strides made in the development and wide acceptance of standards for exchanging structured information about genomic variants, progress in standards for computational phenotype analysis for translational genomics has lagged behind. Phenotypic features (signs, symptoms, laboratory and imaging findings, results of physiological tests, etc.) are of high clinical importance, yet exchanging them in conjunction with genomic variation information is often overlooked or even neglected. Continue reading

Implementation of the GA4GH Beacon protocol for discovery and sharing of genomic copy number variation data

ESHG Vienna 2022

Michael Baudis

Background & Objectives Genomic copy number variations (CNV) are a major contributor to inter-individual genomic variation, can be causative events in rare diseases, but especially represent the majority of the mutational landscape in the most malignancies. While specific CNV events and some recurring patterns have contributed to the identification of individual cancer drivers and the recognition of cancer subtypes, the complexity of genomic CNV patterns requires large amounts of well-defined genomic profiles for statistically meaningful analyses. At the other end of the spectrum, in the area of rare disease genomics the potential pathogenicity of individual CNV events requires validation against a vast set of disease-related and reference genomic profiles and annotations.

Continue reading

Progenetix & BeaconPlus - An open cancer genomics resource on a stack of Beacon code...

ELIXIR All Hands Amsterdam 2022

Michael Baudis

Here Michael provides some overview of the multi-year trajectory of the Beacon API development, and how BeaconPlus & Progenetix have been utilized for "implementation driven design".

Continue reading

Beacon v2 and Beacon networks: A "lingua franca" for federated data discovery in biomedical genomics, and beyond

Beacon v2 publication

Rambla J, Baudis M, Ariosa R, Beck T, Fromont LA, Navarro A, Paloots R, Rueda M, Saunders G, Singh B, Spalding JD.

Human Mutation. 2022 Mar 17. PMID:35297548

Abstract Beacon is a basic data discovery protocol issued by the Global Alliance for Genomics and Health (GA4GH). The main goal addressed by version 1 of the Beacon protocol was to test the feasibility of broadly sharing human genomic data, through providing simple "yes" or "no" responses to queries about the presence of a given variant in datasets hosted by Beacon providers.

Continue reading

hCNV Implementation Studies Old and New

ELIXIR Human Data Communities

Michael Baudis

This presentation provided an overview about the hCNV community, implementation studies and ongoing work, e.g. interaction with the GA4GH VRS standard group and Beacon development.

Continue reading

Technical, legal and ethics aspects of genomic data sharing

Genomes | Privacy | Laws | Society - DSI Ethics Seminar

Michael Baudis

The presentation introduces the need for sharing and federated discovery of genome data in the contexts of personalized health and genomic researchand some of teh current developments in international standards and practices in the area. Continue reading

The GA4GH Phenopacket schema: A computable representation of clinical data for precision medicine

Phenopackets v2 preprint

Jacobsen JOB, Baudis M, Baynam GS, Beckmann JS, Beltran S, Callahan TJ, Chute CG, Courtot M, Danis D, Elemento O, Freimuth RR, ..., Haendel MA, Robinson PN.

medRxiv, 2021.11.27.21266944. doi:10.1101/2021.11.27.21266944

Abstract Despite great strides in the development and wide acceptance of standards for exchanging structured information about genomic variants, there is no corresponding standard for exchanging phenotypic data, and this has impeded the sharing of phenotypic information for computational analysis. Here, we introduce the Global Alliance for Genomics and Health (GA4GH) Phenopacket schema, which supports exchange of computable longitudinal case-level phenotypic information for diagnosis and research of all types of disease including Mendelian and complex genetic diseases, cancer, and infectious diseases. Continue reading

The GA4GH Variation Representation Specification (VRS): a Computational Framework for the Precise Representation and Federated Identification of Molecular Variation.

Alex H. Wagner, Lawrence Babb, Gil Alterovitz, Michael Baudis, Matthew Brush, Daniel L. Cameron, Melissa Cline , Malachi Griffith, Obi L. Griffith, ..., Melissa Konopko, Heidi L. Rehm, Andrew D. Yates, Robert R. Freimuth, Reece K. Hart

Wagner, Alex H. et al. Cell Genomics, Volume 1, Issue 2, 100027 doi:10.1016/j.xgen.2021.100027
bioRxiv. version 20212021.01.15.426843. (2021-01-15)

Note

Cell Genomics logoThis article was published as part of a special GA4GH edition of Cell Genomics.

Abstract Maximizing the personal, public, research, and clinical value of genomic information will require the reliable exchange of genetic variation data. We report here the Variation Representation Specification (VRS, pronounced “verse”), an extensible framework for the computable representation of variation that complements contemporary human-readable and flat file standards for genomic variation representation. VRS provides semantically precise representations of variation and leverages this design to enable federated identification of biomolecular variation with globally consistent and unique computed identifiers. Continue reading

International federation of genomic medicine databases using GA4GH standards

Adrian Thorogood, Heidi L. Rehm, Peter Goodhand, Angela J.H. Page, Yann Joly, Michael Baudis, Jordi Rambla, Arcadi Navarro, Tommi H. Nyronen, Mikael Linden, Edward S. Dove, Marc Fiume, Michael Brudno, Melissa S. Cline, Ewan Birney

Thorogood, Adrian et al. Cell Genomics, Volume 1, Issue 2, 100032 doi:10.1016/j.xgen.2021.100032

Note

Cell Genomics logoThis article was published as part of a special GA4GH edition of Cell Genomics.

Abstract We promote a shared vision and guide for how and when to federate genomic and health-related data sharing, enabling connections and insights across independent, secure databases. The GA4GH encourages a federated approach wherein data providers have the mandate and resources to share, but where data cannot move for legal or technical reasons. We recommend a federated approach to connect national genomics initiatives into a global network and precision medicine resource.

Continue reading

GA4GH: International policies and standards for data sharing across genomic research and healthcare

Heidi L. Rehm, Angela J.H. Page, Lindsay Smith, Jeremy B. Adams, Gil Alterovitz, Lawrence J. Babb, Maxmillian P. Barkley, Michael Baudis, Michael J.S. Beauvais, Tim Beck, Jacques S. Beckmann, Sergi Beltran, David Bernick, Alexander Bernier, James K. Bonfield, Tiffany F. Boughtwood, Guillaume Bourque, Sarion R. Bowers, Anthony J. Brookes, Michael Brudno, Matthew H. Brush, David Bujold, Tony Burdett, Orion J. Buske, Moran N. Cabili , Daniel L. Cameron, Robert J. Carroll, Esmeralda Casas-Silva, Debyani Chakravarty, Bimal P. Chaudhari, Shu Hui Chen, J. Michael Cherry, Justina Chung, Melissa Cline, Hayley L. Clissold, Robert M. Cook-Deegan, Mélanie Courtot, ..., Peter Goodhand, Kathryn North, Ewan Birney

Rehm, Heidi L. et al. Cell Genomics, Volume 1, Issue 2, 100029 doi:10.1016/j.xgen.2021.100029

Note

This article was published as part of a special GA4GH edition of Cell Genomics.

Abstract The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. Continue reading

A cancer genomics resource built around and driving GA4GH standards

GRIC sponsored workshop with the Swiss Institute of Bioinformatics

Michael Baudis

The Progenetix oncogenomics resource provides sample-specific cancer genome profiling data and biomedical annotations as well as provenance data for cancer studies. With more than 100k genomic copy number number (CNV) profiles from over 700 cancer types, Progenetix empowers comparative analyses beyond individual studies and diagnostic concepts.

Continue reading

A Standardized Format for Federated Genomic Data Exchange

The GA4GH Beacon Protocol Presented at BC2 Basel 2021
Session "Federating computational analyses with GA4GH standards"

Michael Baudis

BC2 logoDuring the "Federating computational analyses with GA4GH standards" workshop at BC2 2021 Michael presented history and the current status of the Beacon project, as well as its integration with specific data resources and analysis initiatives.

Continue reading

Cancer genomics reference resource and toolkit around GA4GH standards

ESHG 2021

Q. Huang, B. Gao, R. Paloots, P. Carrio-Cordo, Z. Yang, M. Baudis

ESHG LogoThis poster presentation at the European Society of Human Genetics meeting 2021 discusses the integration and development of GA4GH standards by the Progenetix oncogenomics resource.

Continue reading

Progenetix - An open reference resource for copy number vatiation data in cancer

Qingyao Huang

Continue reading

The Progenetix oncogenomic resource in 2021

Article describing the current content & technical status of progenetix.org

Qingyao Huang, Paula Carrio Cordo, Bo Gao, Rahel Paloots, Michael Baudis

Database (Oxford). 2021 Jul 17;2021:baab043.

DATABASE logo

Abstract In cancer, copy number aberrations (CNAs) represent a type of nearly ubiquitous and frequently extensive structural genome variations. To disentangle the molecular mechanisms underlying tumorigenesis as well as identify and characterize molecular subtypes, the comparative and meta-analysis of large genomic variant collections can be of immense importance. Over the last decades, cancer genomic profiling projects have resulted in a large amount of somatic genome variation profiles, however segregated in a multitude of individual studies and datasets. The Progenetix project, initiated in 2001, curates individual cancer CNA profiles and associated metadata from published oncogenomic studies and data repositories with the aim to empower integrative analyses spanning all different cancer biologies. Continue reading

hCNV Community and Implementation Studies

Michael Baudis

ELIXIR All Hands 2021 Human Data Day

At the Human Data Day Michael presents a very brief overview of the ending and upcoming ELIXIR hCNV implementation studies.

Continue reading

hCNV data and the Progenetix Beacon

Presentation at ELIXIR All Hands 2021

Michael Baudis

ELIXIR All Hands 2021

This presentation gives a brief overview of the use of the Progenetix resource to test and implement a genomics reference resource using the emerging Beacon v2 protocol.

Continue reading

Signatures of Discriminative CNA in 31 Cancer Subtypes

Bo Gao and Michael Baudis (2021)

Published at Frontiers in Genetics, 2021-05-13

Abstract

Copy number aberrations (CNA) are one of the most important classes of genomic mutations relatedto oncogenetic effects. In the past three decades, a vast amount of CNA data has been generated bymolecular-cytogenetic and genome sequencing based methods. While this data has been instrumentalin the identification of cancer-related genes and promoted research into the relation between CNA andhisto-pathologically defined cancer types, the heterogeneity of source data and derived CNV profilespose great challenges for data integration and comparative analysis. Furthermore, a majority of exist-ing studies have been focused on the association of CNA to pre-selected ”driver” genes with limitedapplication to rare drivers and other genomic elements.

Continue reading

Progenetix, Beacon and GA4GH at RDA

Research Data Alliance - RDA Virtual Plenary 17

Concepts | Status | History | Outlook

Michael Baudis

Research Data Alliance - RDA Virtual Plenary 17

This seminar gives an overview of current state of the Progenetix Beacon project and the overall connection to the Global Alliance for Genomics and Health (GA4GH).

Continue reading

Implementing GA4GH Standards to Drive an Open Oncogenomics Resource

Research Seminar Kinderspital Zürich - Neuroonkologie

Michael Baudis

Seminar Neurooncology Childrens Hospital Zürich

This seminar gives an overview of the history & current state of the Progenetix resource, it's role in Beacon API development and the overall connection to the Global Alliance for Genomics and Health (GA4GH).

Continue reading

Discovering copy number variation across multiple cancer types

Qingyao Huang

Abstract

Genomic variations are direct cause of tumor formation and accomplice in its continuous evolution. While point mutations can be pinpointed to a targeted genetic element, copy number variations (CNVs) involve copy number gain or loss of a large DNA segment which often covers hundreds of genetic elements in one event. Continue reading

EACR conference - The Progenetix Oncogenomic Resource

Continue reading

GA4GH Connect - Beacon v2 and SchemaBlocks

GA4GH Connect 2020

Michael Baudis

Beacon v2 Structural Variants [slides]
SchemaBlocks {S}[B] [slides]
Continue reading

Copy number variant heterogeneity among cancer types reflects inconsistent concordance with diagnostic classifications

Paula Carrio Cordo and Michael Baudis

bioRxiv. doi: doi.org/10.1101/2021.03.01.433348

This article explores the correlation between subsets of cancer entities, grouped by their somatic CNV patterns, and levels of diagnostic classification systems.

Continue reading

Genomic data and Privacy

Michael Baudis

ETHZ Lecture

The understanding of the impact of individual inherited and somatic genome variants on phenotypes and diseases requires a thorough understanding of the occurrence of such variants amongst populations in general and carriers of the phenotypes and diseases in particular. This information can only be provided through the inclusion of data from a multitude of genome resources in variant evaluation efforts, including such from outside (international) jurisdictions. However, opening such resources carries the inherent risk of breaching privacy, particularly through re-identification of individuals or their relatives and potentially through the exposure of individual genome-related personal information including phenotypic and "performance" prediction and relative disease risk.

Continue reading

Beacon v2 – Towards flexible use and clinical applications for a reference genomic data protocol

SPHN Webinar

Michael Baudis

Genomic “Beacons” provide discovery services for genomic data using the Beacon API developed as a key driver project of the Global Alliance for Genomics and Health (GA4GH). The Beacon protocol itself defines an open standard for genomics data discovery and provides a framework for web services responding to queries against genomic data collections, for instance from population based or disease specific genome repositories. Continue reading

Update of Progenetix Oncogenomics Resource

Research Progress Report, DMLS, University of Zurich

Qingyao Huang

Copy number aberration (CNA) is frequently observed in cancer genomes. Meta-analysis of genomic variations helps to disentangle the multiplex molecular mechanism underlying tumorigenesis as well as identify and characterize molecular subtypes. Over the years, cancer genomic research have resulted in a considerable amount of data segregated by studies. The Progenetix project (www.progenetix.org), initiated in 2001, aims to systematize the published cancer genomic profiles and provide accurate annotation to facilitate integrative analysis. Continue reading

Welcome to Ziying

Today Ziying Yang arrived as a new member of the baudisgroup.

Welcome Ziying!

Continue reading

GA4GH Beacon v2 at GA4GH Plenary

GA4GH Beacon v2 - Evolving Reference Standard for Genomic Data Exchange

GA4GH 8th Plenary

Gary Saunders, Jordi Rambla de Argila, Anthony Brookes, Juha Törnroos and Michael Baudis

For the ELIXIR Beacon project, GA4GH Discovery work stream and the international network of Beacon API developers

The Beacon driver project was one of the earliest initiatives of the Global Alliance for Genomics and Health with the Beacon v1.0 API as first approved GA4GH standard. Version 2 of the protocol is slated to provide fundamental changes, towards a Internet of Genomics foundational standard: * requests beyond genomic variants ("filters") * payload responses, secured through open AAI * aligning w/ GA4GH standards (Phenopackets, VRS, DUO...) through SchemaBlocks {S}[B] * Working with international partners on deployment of advanced implementations

Continue reading

Progenetix - A cancer genomics reference resource around GA4GH standards

GA4GH 8th Plenary

Michael Baudis

The Progenetix oncogenomics resource provides sample-specific cancer genome profiling data and biomedical annotations as well as provenance data from cancer studies. Especially through currently 113322 curated genomic copy number number (CNV) profiles from 1600 individual studies representing over 500 cancer types (NCIt), Progenetix empowers aggregate and comparative analyses which vastly exceed individual studies or single diagnostic concepts. Continue reading

Cancer Data - ELIXIR::GA4GH: Advancing genomics resources through standards and ontologies

ECCB2020

Michael Baudis

Continue reading

The Ubiquitin Ligase TRIP12 Limits PARP1 Trapping and Constrains PARP Inhibitor Efficiency

Marco Gatti, Ralph Imhof, Qingyao Huang, Michael Baudis, Matthias Altmeyer

Cell Rep. 2020 Aug 4 DOI: 10.1016/j.celrep.2020.107985

Abstract PARP inhibitors (PARPi) cause synthetic lethality in BRCA-deficient tumors. Whether specific vulnerabilities to PARPi exist beyond BRCA mutations and related defects in homology-directed repair (HDR) is not well understood. Here, we identify the ubiquitin E3 ligase TRIP12 as negative regulator of PARPi sensitivity. Continue reading

Beacon v2 - Towards Flexible Use and Clinical Applications for a Reference Genomic Data Sharing Protocol

Personalized Health Technologies 2020

Michael Baudis

Beacons provide discovery services for genomic data using the Beacon API developed under the leadership of ELIXIR, as a key driver project of the Global Alliance for Genomics and Health (GA4GH). The Beacon protocol itself defines an open standard for genomics data discovery. It provides a framework for public web services responding to queries against genomic data collections, for instance from population based or disease specific genome repositories. Sites offering beacons can scale through aggregation in "Beacon Networks", which distribute single genome queries among a potentially large number of international beacons and assemble their responses. Continue reading

ELIXIR All Hands - Beacon Evolution

ELIXIR All Hands 2020 - Beacon Workshop

Michael Baudis

This presentation covers some of Beacon's origins, features and directions.

Continue reading

Oncology Informatics: Status Quo and Outlook - Review

Paul Martin Putora, Michael Baudis, Beth M. Beadle, Issam El Naqa, Frank A. Giordano and Nils H. Nicolay

Oncology, 2020-05-14. DOI 10.1159/000507586 (Review)

Abstract Oncology has undergone rapid progress, with emerging developments in areas including cancer stem cells, molecularly targeted therapies, genomic analyses, and individually tai- lored immunotherapy. These advances have expanded the tools available in the fight against cancer. Some of these have seen broad media coverage resulting in justified public attention. However, these achievements have only been possible due to rapid developments in the expanding field of biomedical informatics and information technology (IT). Continue reading

Swissnex SF: Laura & Michael Baudis - Life & Family

Originally planned for their stays at UCB and LBNL, Laura & Michael were interviewed by Tabea Stoeckel from swissnex San Francisco about their stay in the Bay Area and their research & family life as internationally active scientists.

Continue reading

Minimum Error Calibration and Normalization for Genomic Copy Number Analysis

Bo Gao and Michael Baudis (2020)

bioRxiv, 2019-07-31. DOI 10.1101/720854
Genomics, Volume 112, Issue 5, September 2020, Pages 3331-3341, accepted 2020-05-06 doi.org/10.1016/j.ygeno.2020.05.008.
Background

Copy number variations (CNV) are regional deviations from the normal autosomal bi-allelic DNA content. While germline CNVs are a major contributor to genomic syndromes and inherited diseases, the majority of cancers accumulate extensive "somatic" CNV (sCNV or CNA) during the process of oncogenetic transformation and progression. While specific sCNV have closely been associated with tumorigenesis, intriguingly many neoplasias exhibit recurrent sCNV patterns beyond the involvement of a few cancer driver genes. Continue reading

SchemaBlocks and GA4GH TASC

A brief presentation about SchemaBlocks concepts and its possible integration into the new GA4GH TASC effort.

Continue reading

A harmonized meta-knowledgebase of clinical interpretations of somatic genomic variants in cancer

Alex H. Wagner, Brian Walsh, Georgia Mayfield, David Tamborero, Dmitriy Sonkin, Kilannin Krysiak, Jordi Deu-Pons, Ryan P. Duren, Jianjiong Gao, Julie McMurry, Sara Patterson, Catherine del Vecchio Fitz, Beth A. Pitel, ..., Nuria Lopez-Bigas, Mark Lawler, Jeremy Goecks, Malachi Griffith, Obi L. Griffith, Adam A. Margolin & Variant Interpretation for Cancer Consortium

Nature Genetics volume 52, pages 448–457 (2020)

Precision oncology relies on accurate discovery and interpretation of genomic variants, enabling individualized diagnosis, prognosis and therapy selection. We found that six prominent somatic cancer variant knowledgebases were highly disparate in content, structure and supporting primary literature, impeding consensus when evaluating variants and their relevance in a clinical setting. We developed a framework for harmonizing variant interpretations to produce a meta-knowledgebase of 12,856 aggregate interpretations. Continue reading

Geographic assessment of cancer genome profiling studies

Paula Carrio Cordo, Elise Acheson, Qingyao Huang and Michael Baudis (2020)

DATABASE, Volume 2020, 2020, baaa009, doi.org/10.1093/database/baaa009
bioRxiv preprint, 2020-01-11. DOI 10.1101/827683

Abstract Cancers arise from the accumulation of somatic genome mutations, which can be influenced by inherited genomic variants and external factors such as environmental or lifestyle-related exposure. Due to the heterogeneity of cancers, precise information about the genomic composition of germline and malignant tissues has to be correlated with morphological, clinical and extrinsic features to advance medical knowledge and treatment options. With global differences in cancer frequencies and disease types, geographic data is of importance to understand the interplay between genetic ancestry and environmental influence in cancer incidence, progression and treatment outcome. Continue reading

SWISSNEX SF Lunch Seminar - Data Mining in Genomics

Genomic Research and Personalised Health Strategies

Resources | Standards | Protocols | Tools | Discourse

These are the slides of a short presentation, given (virtually, since COVID-19) for a SWISSNEX San Francisco lunch meeting.

Continue reading

Enabling population assignment from cancer genomes with SNP2pop

Huang Q and Baudis M. (2020)

Sci Rep 10, 4846 (2020). doi.org/10.1038/s41598-020-61854-x

Abstract In many cancers, incidence, treatment efficacy and overall prognosis vary between geographic populations. Studies disentangling the contributing factors may help in both understanding cancer biology and tailoring therapeutic interventions. Ancestry estimation in such studies should preferably be driven by genomic data, due to frequently missing or erroneous self-reported or inferred metadata. While respective algorithms have been demonstrated for baseline genomes, such a strategy has not been shown for cancer genomes carrying a substantial somatic mutation load. We have developed a bioinformatics tool for the assignment of population groups from genome profiling data for both unaltered and cancer genomes. Continue reading

BBOP Presentation - Baudisgroup Projects & Interests

Continue reading

ELIXIR Beacon Project - Networking Resources Across and Beyond ELIXIR Human Data Communities

ELIXIR Open Day - Wellcome Trust Genome Campus Hinxton

Michael Baudis

In this presentation I introduce the Beacon project and provide my opinions about its future trajectory, and especially its role in driving the alignment of ELIXIR and GA4GH projects in related to (human) genome data sharing.

Continue reading

GA4GH SchemaBlocks for Human Cell Atlas

This is a presentation of the SchemaBlocks initiative and the overall GA4GH context, for the Human Cell Atlas project, given by Michael at one of their teleconferences.

Continue reading

Geographic assessment of cancer genome profiling studies

Paula Carrio Cordo, Elise Acheson, Qingyao Huang and Michael Baudis (2020)

bioRxiv, 2020-11-01. DOI 10.1101/827683

Abstract Cancers arise from the accumulation of somatic genome mutations, which can be influenced by inherited genomic variants and external factors such as environmental or lifestyle-related exposure. Due to the heterogeneity of cancers, precise information about the genomic composition of germline and malignant tissues has to be correlated with morphological, clinical and extrinsic features to advance medical knowledge and treatment options. With global differences in cancer frequencies and disease types, geographic data is of importance to understand the interplay between genetic ancestry and environmental influence in cancer incidence, progression and treatment outcome. Continue reading

Talk at St. Gallen Radiation Oncology - Bioinformatics and Data Exchange

3rd St. Gallen Radiation Oncology Informatics Meeting

Bioinformatics and data exchange for genomics in an international context

Michael Baudis

The presentation at the St. Gallen meeting introduced the audience to the group's research and resources, and how this is connected to the different national & international data standards and sharing initiatives.

Continue reading

Talk at AMED Tokyo - Cancer Genomics and Standards

Mini-Symposium about CNV and Data Standards at AMED Japan, Tokyo

Cancer Genomics and Implementation of Data Driven Standards for Genomic Data Exchange

Michael Baudis

At this meeting, several Japanese participants presented their research and results, with a focus on Copy Number Variants and other structural genome variations. Continue reading

Minimum Error Calibration and Normalization for Genomic Copy Number Analysis

BC2 2019, Basel

Bo Gao

Abstract

Background:
Copy number variations (CNV) are regional deviations from thenormal autosomal bi-allelic DNA content. While germline CNVs are a majorcontributor to genomic syndromes and inherited diseases, the majority of cancersaccumulate extensive ”somatic” CNV (sCNV or CNA) during the process ofoncogenetic transformation and progression. While specific sCNV have closelybeen associated with tumorigenesis, intriguingly many neoplasias exhibitrecurrent sCNV patterns beyond the involvement of a few cancer driver genes.Currently, CNV profiles of tumor samples are generated using genomicmicro-arrays or high-throughput DNA sequencing. Regardless of the underlyingtechnology, genomic copy number data is derived from the relative assessmentand integration of multiple signals, with the data generation process being proneto contamination from several sources. Estimated copy number values have noabsolute and linear correlation to their corresponding DNA levels, and the extentof deviation differs between sample profiles which poses a great challenge fordata integration and comparison in large scale genome analysis. Continue reading

Cancer cell lines in focus: somatic copy number & germline variation

BC2 2019, Basel

Qingyao Huang

Abstract

Background:
Human cell lines are convenient model systems in cancer research, for validation of proposed molecular mechanisms as well as to evaluate potential therapeutic approaches, e.g. through high- throughput screening of potential anti-tumour compounds against cancer cell line panels. However, conclusions about biological pathways or pharmacological potential depend on a close molecular relation between the cancer type represented and the cell line model used for analyses. Continue reading

Structural Genome Variants in Cancer: Research, resources standards

Seminar at the University of Florence

Seminar Universita degli Studi Firenze - Dipartimento di Biologia

Structural Genome Variants in Cancer: Research, resources standards

Michael Baudis

Abstract

Genomic copy number variations are major contributors to malignant transformation and progression and constitute - at least in their quantitative extension - the largest contributors to genomic mutation landscapes, in the majority of cancer types. Such mutations occur in the vast majority of tumors as somatic genome alterations (sCNV) during clonal development and expansion and are promoted by a variety of mechanisms leading to extended or focal changes in the number of genomic segments. Continue reading

Leveraging European infrastructures to access 1 million human genomes by 2022

Gary Saunders, Michael Baudis, Regina Becker, Sergi Beltran, Christophe Béroud, Ewan Birney, Cath Brooksbank, Søren Brunak, Marc Van den Bulcke, Rachel Drysdale, Salvador Capella-Gutierrez, Paul Flicek, ..., Niklas Blomberg, and Serena Scollen

Nature Reviews Genetics volume 20, pages693–701 (2019)

Abstract Human genomics is undergoing a step change from being a predominantly research-driven activity to one driven through health care as many countries in Europe now have nascent precision medicine programmes. To maximize the value of the genomic data generated, these data will need to be shared between institutions and across countries. In recognition of this challenge, 21 European countries recently signed a declaration to transnationally share data on at least 1 million human genomes by 2022. In this Roadmap, we identify the challenges of data sharing across borders and demonstrate that European research infrastructures are well-positioned to support the rapid implementation of widespread genomic data access.

Continue reading

Minimum Error Calibration and Normalization for Genomic Copy Number Analysis

Bo Gao and Michael Baudis (2019)

bioRxiv, 2019-07-31. DOI 10.1101/720854

Abstract Copy number variations (CNV) are regional deviations from the normal autosomal bi-allelic DNA content. While germline CNVs are a major contributor to genomic syndromes and inherited diseases, the majority of cancers accumulate extensive “somatic” CNV (sCNV or CNA) during the process of oncogenetic transformation and progression. While specific sCNV have closely been associated with tumorigenesis, intriguingly many neoplasias exhibit recurrent sCNV patterns beyond the involvement of a few cancer driver genes. Continue reading

ELIXIR All Hands - Beacon Introduction

Michael Baudis

This presentation was the opener for the ELIXIR Beacon session, and introduces to current developments and especially the interactions between GA4GH :: Discovery and ELIXIR Beacon.

Continue reading

HGVS 2019 - Development of Standards for Genomic Data Exchange

Human Genome Variation Society - Gothenburg 2019

Implementation Driven Development of Standards for Genomic Data Exchange from Cancer Genome Data Collections

Michael Baudis

Abstract

Cancers are genomic diseases, arising from the clonal propagation of somatic mutation events, with a limited contribution from inherited genomic variants. Genomic copy number variations are major contributors to malignant transformation and progression and constitute - at least in their quantitative extension - the largest contributors to genomic mutation landscapes, in the majority of cancer types. Continue reading

Connecting the silos - Genomic Data Standards, Resources and the Global Alliance for Genomics and Health

R&D Data Intelligence Leaders Forum Basel

Michael Baudis

Abstract

This presentation discusses the need for data sharing in genomics, provides information about the Global Alliance for Genomics and Health (GA4GH), and shows some of our group's contributions, especially regarding Beacon development.

Continue reading

Federated discovery and sharing of genomic data using Beacons

Miroslav Cupak , Stephen Keenan , Jordi Rambla , Sabela de la Torre , Stephanie Dyke , Anthony Brookes , Knox Carey , David Lloyd , Peter Goodhand , Maximilian Haeussler , Michael Baudis , Heinz Stockinger , Lena Dolman , Ilkka Lappalainen , Juha Törnroos , Mikael Linden , John Spalding , Saif Ur-Rehman , Angela Page , Paul Flicek , Susheel Varma , Gary Saunders , Serena Scollen , Stephen Sherry , David Haussler , Beacon Project Team

Nat Biotechnol (2019), accepted 2019-01-23

Abstract The Beacon Project (github.com/ga4gh-beacon/) is a GA4GH initiative that is developing an open specification for genetic variation discovery and sharing. The project is demonstrating the willingness of international organizations to work together to define standards for, and actively engage in, genomic data sharing. In the two years since the project’s inception, over 90 Beacons have been lit by 35 organizations serving over 200 datasets. Continue reading

DNA copy number imbalances in primary cutaneous lymphomas (PCL)

Gug G, Huang Q, Chiticariu E, Solovan C and Baudis M (2019)

JEADV, 2019-01-19. doi.org/10.1111/jdv.15442

The article has been published with the Journal of the European Academy of Dermatology and Venereology on January 19, 2019. A corresponding preprint can be accessed through [bioRxiv].

Background

Cutaneous lymphomas (CL) represent a clinically defined group of extran‐ odal non‐Hodgkin lymphomas harbouring heterogeneous and incompletely delineated molecular aberrations. Over the past decades, molecular stud‐ ies have identified several chromosomal aberrations, but the interpreta‐ tion of individual genomic studies can be challenging.

Objective

With a comprehensive meta‐analysis, we aim to delineate genomic alter‐ ations for different types of CL and propose a more accurate classifica‐ tion in line with their various pathogenicity. Continue reading

Enabling population assignment from cancer genomes with SNP2pop

Huang Q and Baudis M. (2019)

bioRxiv, 2019-01-14. doi.org/10.1101/368647 (first version 2018-07-14)

Abstract For a variety of human malignancies, incidence, treatment efficacy and overall prognosis show considerable variation between different populations and ethnic groups. Disentangling the effects related to particular population backgrounds can help in both understanding cancer biology and in tailoring therapeutic interventions. Because self-reported or inferred patient data can be incomplete or misleading due to migration and genomic admixture, a data-driven ancestry estimation should be preferred. While algorithms to analyze ancestry structure from healthy individuals have been developed, an easy-to-use tool to assign population groups based on genotyping data from SNP profiles is still missing and benchmarking for the validity of population assignment strategy for aberrant cancer genomes was not tested. Continue reading

2018 09 18 BIO390 Michael Baudis Introduction to Bioinformatics

UZH BIO390 "Introduction to Bioinformatics"

Bioinformatics - Introduction

Michael Baudis

Abstract

First lecture in the UZH BIO390 "Introduction to Bioinformatics" series, introducing concepts and scope of bioinformatics as a field - 2018 version.

Continue reading

ECCB 2018 - Beacon

Abstract: ECCB 2018

ELIXIR Beacon - A Driver Project for the Global Alliance for Genomics and Health

Michael Baudis for the ELIXIR Beacon Project

The Global Alliance for Genomics and Health (GA4GH) develops standards and guidelines to facilitate the international sharing of genomic and health related metadata. The creation of GA4GH work stream products is moved forward through driver projects, which address particular scientific, technical, regulatory or security related aspects of data access and sharing. Continue reading

Registered access: authorizing data access

Dyke SOM, Linden M, Lappalainen I, De Argila JR, Carey K, Lloyd D, Spalding JD, Cabili MN, Kerry G, Foreman J, Cutts T, Shabani M, Rodriguez LL, Haeussler M, Walsh B, Jiang X, Wang S, Perrett D, Boughtwood T, ..., Rehm HL, Baudis M, Sherry ST, Kato K, Knoppers BM, Baker D, and Flicek P

European Journal of Human Genetics (2018)

Abstract The Global Alliance for Genomics and Health (GA4GH) proposes a data access policy model—“registered access”—to increase and improve access to data requiring an agreement to basic terms and conditions, such as the use of DNA sequence and health data in research. A registered access policy would enable a range of categories of users to gain access, starting with researchers and clinical care professionals. It would also facilitate general use and reuse of data but within the bounds of consent restrictions and other ethical obligations. In piloting registered access with the Scientific Demonstration data sharing projects of GA4GH, we provide additional ethics, policy and technical guidance to facilitate the implementation of this access model in an international setting.

Continue reading

Mountains and Chasms - Surveying the Oncogenomic Publication Landscape

Carrio Cordo P and Baudis M. (2018)

Preprints 2018, 2018070618 (doi: 10.20944/preprints201807.0618.v1).
Oncology (2018; online Oct 26)

Abstract Cancers arise from the accumulation of somatic genome mutations, with varying contributions of intrinsic (i.e. genetic predisposition) and extrinsic (i.e. environmental) factors. For the understanding of malignant clones, precise information about their genomic composition has to be correlated with morphological, clinical and individual features, in the context of the available medical knowledge. Continue reading

Population assignment from cancer genome profiling data

Huang Q and Baudis M. (2018)

bioRxiv, 2018-07-14. doi:10.1101/368647

Abstract For a variety of human malignancies, incidence, treatment efficacy and overall prognosis show considerable variation between different populations and ethnic groups. Disentangling the effects related to particular population backgrounds can help in both understanding cancer biology and in tailoring therapeutic interventions. Because self-reported or inferred patient data can be incomplete or misleading due to migration and genomic admixture, a data-driven ancestry estimation should be preferred. While tools to map and utilize ancestry information from healthy individuals have been introduced, a population assignment based on genotyping data from somatic variation profiling of cancer samples is still missing. Continue reading

A harmonized meta-knowledgebase of clinical interpretations of cancer genomic variants

Wagner AH, Walsh B, Mayfield G, Tamborero D, Sonkin D, Krysiak K, Deu Pons J, Duren R, Gao J, McMurry J, Patterson S, Del Vecchio Fitz C, Sezerman OU, Warner J, Rieke DT, Aittokallio T, Cerami E, Ritter D, Schriml LM, Haendel M, Raca G, Madhavan S, Baudis M, ..., Griffith M, Griffith OL, and Margolin A

bioRxiv. doi:10.1101/366856

Precision oncology relies on the accurate discovery and interpretation of genomic variants to enable individualized therapy selection, diagnosis, or prognosis. However, knowledgebases containing clinical interpretations of somatic cancer variants are highly disparate in interpretation content, structure, and supporting primary literature, reducing consistency and impeding consensus when evaluating variants and their relevance in a clinical settin Continue reading

Qingyao - Institute Progress Report

IMLS Progress Report

Towards understanding population effect on cancer

Qingyao Huang

Abstract

With a combination of ~50,000 curated oncogenomic array data from the arrayMap database and ~20,000 profiles from TCGA project depository, we perform a meta- analysis to investigate influence of genetic background on the CNV patterns in cancer. From sequencing data of 26 world-wide populations from 1000 Genomes project, we extract the SNP markers and use them for subsequent sample analysis. Continue reading

The ELIXIR Beacon in 2018: A driver project of GA4GH

ELIXIR All Hands, Berlin

Michael Baudis

Abstract

The core mission of the Global Alliance for Genomics and Health is to "...enable genomic data sharing for the benefit of human health". One of the instruments to enact this mission is the selection and support of driver projects, which address particular scientific, technical, regulatory or security related aspects of federated access to human genomes and related metadata. Continue reading

segment_liftover : a Python tool to convert segments between genome assemblies

Abstract: ELIXIR All Hands 2018

Bo Gao, Qingyao Huang and Michael Baudis

The process of assembling a species’ reference genome may be performed in a number of iterations, with subsequent genome assemblies differing in the coordinates of mapped elements. The conversion of genome coordinates between different assemblies is required for many integrative and comparative studies. While currently a number of bioinformatics tools are available to accomplish this task, most of them are tailored towards the conversion of single genome coordinates. When converting the boundary positions of segments spanning larger genome regions, segments may be mapped into smaller sub-segments if the original segment’s continuity is disrupted in the target assembly. Continue reading

ELIXIR Beacon - A Driver Project for the Global Alliance for Genomics and Health

Abstract: ELIXIR All Hands 2018

Michael Baudis for the ELIXIR Beacon Project

The core mission of the Global Alliance for Genomics and Health is to "...enable genomic data sharing for the benefit of human health". One of the instruments to enact this mission is the selection and support of driver projects, which address particular scientific, technical, regulatory or security related aspects of federated access to human genomes and related metadata.

The "Beacon" project had been initiated at the first GA4GH plenary in 2014, to test the willingness of genome resource providers to allow web-based queries in the most simple format, against aggregated genome variant data. While this particular model had limited practical utility, it established the foundation for automated queries against world-wide data resources, using a unified protocol. A secondary aspect of the Beacon project was the, intentional, challenge of existing notions of data security and privacy protection, including possible re-identification attacks and risk mitigation scenarios.

Now continued through the ELIXIR-Beacon GA4GH driver project, extensions to the Beacon protocol will provide real-world utility for genome resource access under a number of different usage scenarios. With its support for developing the Beacon protocol and distributing it throughout its members, ELIXIR is providing one of the first, widespread implementations of a GA4GH related products. Additionally, the upcoming version of the Beacon protocol is expected to be one of the first official standards which underwent the GA4GH product approval process, facilitating widespread international adoption.

Continue reading

Biocuration for cancer genome databases: arrayMap and Progenetix

Abstract: ELIXIR All Hands 2018

Paula Carrio Cordo and Michael Baudis

Screening for somatic mutations in cancer has become integral to diagnostic and target evaluation for personalized therapeutic approaches. arrayMap is a curated oncogenomic resource, focusing on copy number aberration (CNA) profiles derived from genomic arrays. The information has been processed from data accessed through NCBI’s Gene Expression Omnibus (GEO), EBI’s ArrayExpress, and, importantly, through targeted mining of publication data. Whereas this database is based on raw probe data sets, the parental project, Progenetix, allows for genome variant analysis from additional sources and serves as metadata reference. Continue reading

ELIXIR All Hands 2018 - arrayMap

Abstract: ELIXIR All Hands 2018

Update on arrayMap Cancer Genome Resource

Qingyao Huang and Michael Baudis

arrayMap is a cancer-related genome profile database, curated from public data repositories. The resources’s data processing pipelines handle the homogeneous conversion of available raw data from various genomic array platforms (e.g. cCGH, SNP) and represents pre-computed copy number variation (CNV) profiles, allowing the evaluation of target gene involvement as well as the comparative analysis of whole-genome CNV patterns, e.g. for assessing between-sample CNV heterogeneity and associate sets of similar patterns with metadata qualifiers such as diagnostic classifications. In addition to the representation of cancer-related samples, we also plan to generate copy number profiles from non-cancer samples, in an effort to represent population-related CNV patterns. Continue reading

Genomic profiles of cancer cell lines - A systematic review

Abstract: ELIXIR All Hands 2018

Rahel Paloots, Qingyao Huang, Paula Carrio Cordo and Michael Baudis

Human cell lines are convenient model systems in cancer research, for validation of proposed molecular mechanisms as well as to evaluate potential therapeutic approaches, e.g. through high-throughput screening of potential antitumor compounds against cancer cell line panels. Thousands of established cell lines are available from commercial or academic providers, covering a wide selection of cancer types. However, conclusions about biological pathways or pharmacological potential depend on a close molecular relation between the cancer type represented and the cell line model used for analyses. Continue reading

Krüppel-Like Factor 10 participates in cervical cancer immunoediting through transcriptional regulation of Pregnancy-Specific Beta-1 Glycoproteins

Marrero-Rodríguez D, Taniguchi-Ponciano K, Subramaniam M, Hawse JR, Pitel KS, Arreola-De la Cruz H, Huerta-Padilla V, Ponce-Navarrete G, Figueroa-Corona MDP, Gomez-Virgilio L, Martinez-Cuevas TI, Mendoza-Rodriguez M, Rodriguez-Esquivel M, Romero-Morelos P, Ramirez-Salcedo J, Baudis M, Meraz-Rios M, Jimenez-Vega F, Salcedo M.

Abstract Cervical cancer (CC) is associated with alterations in immune system balance, which is primarily due to a shift from Th1 to Th2 and the unbalance of Th17/Treg cells. Using in silico DNA copy number analysis, we have demonstrated that ~20% of CC samples exhibit gain of 8q22.3 and 19q13.31; the regions of the genome that encodes the KLF10 and PSG genes, respectively. Continue reading

segment_liftover...

segment_liftover : a Python tool to convert segments between genome assemblies.

Gao B, Huang Q and Baudis M

Abstract The process of assembling a species’ reference genome may be performed in a number of iterations, with subsequent genome assemblies differing in the coordinates of mapped elements. The conversion of genome coordinates between different assemblies is required for many integrative and comparative studies. While currently a number of bioinformatics tools are available to accomplish this task, most of them are tailored towards the conversion of single genome coordinates. When converting the boundary positions of segments spanning larger genome regions, segments may be mapped into smaller sub-segments if the original segment’s continuity is disrupted in the target assembly. Such a conversion may lead to a relevant degree of data loss in some circumstances such as copy number variation (CNV) analysis, where the quantitative representation of a genomic region takes precedence over base-specific accuracy. segment_liftover aims at continuity-preserving remapping of genome segments between assemblies and provides features such as approximate locus conversion, automated batch processing and comprehensive logging to facilitate processing of datasets containing large numbers of structural genome variation data.

Online
Continue reading

2018 02 15 talk DPPH michael

DPPH - Data Protection in Personalized Health 2018, Lausanne

Genome Beacons for Data Discovery - Technical advances in a challenging environment

Michael Baudis

Abstract

The core mission of the Global Alliance for Genomics and Health is to "...enable genomic data sharing for the benefit of human health". One of the instruments to enact this mission is the selection and support of driver projects, which address particular scientific, technical, regulatory or security related aspects of federated access to human genomes and related metadata. The "Beacon" project had been initiated at the first GA4GH plenary in 2014, to test the willingness of genome resource providers to allow web-based queries in the most simple format, against aggregated genome variant data. While this particular model had limited practical utility, it established the foundation for automated queries against world-wide data resources, using a unified protocol. A secondary aspect of the Beacon project was the, intentional, challenge of existing notions of data security and privacy protection, including possible re-identification attacks and risk mitigation scenarios. Now continued through the ELIXIR-Beacon GA4GH driver project, extensions to the Beacon protocol will provide real-world utility for genome resource access under a number of different usage scenarios, some of which will be addressed in this presentation.

Continue reading