The GA4GH Phenopacket schema defines a computable representation of clinical data

Phenopackets v2 publication

Cell Genomics logo

Jacobsen JOB, Baudis M, Baynam GS, Beckmann JS, Beltran S, Buske OJ, Callahan TJ, Chute CG, Courtot M, Danis D, Elemento O, Essenwanger A, Freimuth RR, ... , Haendel MA, Robinson PN, The GAGHPMC.

Abstract Despite great strides made in the development and wide acceptance of standards for exchanging structured information about genomic variants, progress in standards for computational phenotype analysis for translational genomics has lagged behind. Phenotypic features (signs, symptoms, laboratory and imaging findings, results of physiological tests, etc.) are of high clinical importance, yet exchanging them in conjunction with genomic variation information is often overlooked or even neglected. In the clinical domain, substantial work has been dedicated to the development of computational phenotypes. Traditionally, these approaches have largely relied on rule-based methods and large sources of clinical data to identify cohorts of patients with or without a specific disease. However, they were not developed to enable deep phenotyping of abnormalities, to facilitate computational analysis of interpatient phenotypic similarity or to support computational decision support. To address this, the Global Alliance for Genomics and Health6 (GA4GH) has developed the Phenopacket schema, which supports the exchange of computable longitudinal case-level phenotypic information for diagnosis of, and research on, all types of disease, including Mendelian and complex genetic diseases, cancers and infectious diseases. A Phenopacket characterizes an individual person or biosample, linking that individual to detailed phenotypic descriptions, genetic information, diagnoses and treatments.

The Phenopacket software is available at github.com/phenopackets/.