LabelSeg: segment annotation for tumor copy number alteration profiles

A tool to assign relative SCNA levels to segments

Zhao H and Baudis M


biorXiv logo Abstract Somatic copy number alterations (SCNA) are a predominant type of oncogenomic alterations that affect a large proportion of the genome in the majority of cancer samples. Current technologies allow high-throughput measurement of such copy number aberrations, generating a large number of SCNA profiles. However, annotation and integration of these profiles are challenging due to the presence of multiple sources of noise and heterogeneity in measurement platforms. In this study, we present LabelSeg, an algorithm for fast and accurate annotation of CNA segments, to improve the interpretation of tumor SCNA profiles. LabelSeg uses segment files as input to estimate calling thresholds for identifying different relative copy number states and is compatible with most CNA measurement platforms. We confirmed its performance on simulated data and sample-derived data from The Cancer Genome Atlas (TCGA) reference dataset, and we demonstrated its utility in integrating heterogeneous segment profiles from different data sources. Our comparative and integrative analysis revealed common SCNA patterns in cancer and protein-coding genes with a strong correlation between SCNA and mRNA expression, promoting the investigation of the role of SCNA in cancer development.