Copy number aberrations (CNA) are one of the most important classes of genomic mutations relatedto oncogenetic effects. In the past three decades, a vast amount of CNA data has been generated bymolecular-cytogenetic and genome sequencing based methods. While this data has been instrumentalin the identification of cancer-related genes and promoted research into the relation between CNA andhisto-pathologically defined cancer types, the heterogeneity of source data and derived CNV profilespose great challenges for data integration and comparative analysis. Furthermore, a majority of exist-ing studies have been focused on the association of CNA to pre-selected ”driver” genes with limitedapplication to rare drivers and other genomic elements.
In this study, we developed a bioinformatics pipeline to integrate a collection of 44,988 high-qualityCNA profiles of high diversity. Using a hybrid model of neural networks and attention algorithm, wegenerated the CNA signatures of 31 cancer subtypes, depicting the uniqueness of their respective CNAlandscapes. Finally, we constructed a multi-label classifier to identify the cancer type and the organ oforigin from copy number profiling data. The investigation of the signatures suggested common patterns,not only of physiologically related cancer types but also of clinico-pathologically distant cancer typessuch as different cancers originating from the neural crest. Further experiments of classification modelsconfirmed the effectiveness of the signatures in distinguishing different cancer types and demonstratedtheir potential in tumor classification.
Note Original preprint was “Genomic Copy Number Signatures Based Classifiers for Subtype Identification in Cancer”. bioRxiv, 2020-12-18