Copy number variant heterogeneity among cancer types reflects inconsistent concordance with diagnostic classifications

Paula Carrio Cordo and Michael Baudis

bioRxiv. doi:

This article explores the correlation between subsets of cancer entities, grouped by their somatic CNV patterns, and levels of diagnostic classification systems.


Due to frequent genome instability and accumulation of mutations during the neoplastic process, malignant tumors present with patterns of somatic genome variants on diverse levels of heterogeneity. The delineation of pathophysiological consequences of these patterns remains one of the main challenges in cancer prognosis and treatment. Although continuous efforts aim for better characterization of cancer entities through inclusion of molecular characteristics, current ontology systems still heavily rely on clinico-pathological features. Traditionally, malignant diseases have been classified using domain-specific or generalized classification systems, based on histopathological features and clinical gestalt. Aside from the general purpose 'International Classification for Diseases in Oncology' (ICD-O; WHO), hierarchical terminologies such as NCIt promote data interoperability and ontology-driven computational analysis. To evaluate two prominent, general cancer classification systems (NCIt and ICD-O) towards their concordance with genomic mutation patterns we have performed a data-driven meta-analysis of 83'505 curated cancer samples with genome-wide CNA (copy number aberration) profiles from our Progenetix database. The analysis provides a basis to assess the correspondence level of existing classification systems with respect to homogeneous molecular groups, and how individual codes represent an adequately detailed classification.