Human cell lines are convenient model systems in cancer research, for validation of proposed molecular mechanisms as well as to evaluate potential therapeutic approaches, e.g. through high- throughput screening of potential anti-tumour compounds against cancer cell line panels. However, conclusions about biological pathways or pharmacological potential depend on a close molecular relation between the cancer type represented and the cell line model used for analyses. The validity of cell line based observations depends therefore A) on the correct identification and lineage assignment, and B) on the conservation of cancer specific molecular alterations during cell line propagation. However, according to ICLAC (International Cell Line Authentication Committee) a large proportion of cell lines have been found to be misidentified.
Methods: Our study is based on 3675 genomic copy number variation (CNV) profiling data from arrayMap resource, representing 1539 distinct cell lines as identified in the Cellosaurus database. In addition, manually curated ontology terms based on retrieved metadata are available, as defined by NCI Thesaurus (NCIt) and International Classification of Diseases-Ontology (ICD-O). For germline analysis, we performed genome-wide phasing and imputation to integrate samples from various array platforms. Then, we estimated kinship and built phylogenetic trees based on germline similarity.
We confirmed the close relatedness within cell lines as well as between cell lines of similar ethnic origin. For somatic variation, we clustered CNV patterns and evaluated the consistency between cell lines and the primary tutor sample data of the same origin.
Conclusions: We analysed both somatic variation and germline signature within the same cell lines, to evaluate dynamic genome changes related to cell line propagation, as well as to determine instances of mis- assignment based on strong genomic profile divergence. As a research output, we provide the germline lineage tree and somatic CNV clustering for available cancer cell lines as an online resource.