Human cell lines are convenient model systems in cancer research, for validation of proposed molecular mechanisms as well as to evaluate potential therapeutic approaches, e.g. through high-throughput screening of potential antitumor compounds against cancer cell line panels. Thousands of established cell lines are available from commercial or academic providers, covering a wide selection of cancer types. However, conclusions about biological pathways or pharmacological potential depend on a close molecular relation between the cancer type represented and the cell line model used for analyses.
The validity of cell line based observations depends therefore A) on the correct identification and lineage assignment, and B) on the conservation of cancer specific molecular alterations during cell line propagation. However, according to ICLAC (International Cell Line Authentication Committee), in a text-based survey of ~32,000 publications for 488 cell lines, a large proportion of cell lines was found to be misidentified.
In the current study, we perform a data-driven assessment of heterogeneity and similarities in-between cell lines, as well as a comparison of cell lines with primary tumor samples of same tumor type. Our study is based on the curation of genomic copy number profiling and metadata data from 3675 genome profiling experiments from our arrayMap resource, representing 1539 distinct cell lines as identified in the Cellosaurus database.
In our analysis, we compare the genomic heterogeneity of “idem” cell lines, to establish measures for dynamic genome changes related to cell line propagation, as well as to determine instances of mis-assignment based on strong genomic profile divergence. Copy number profiles of cell lines will be compared to such from matched primary tumors, to establish a measure for genomic similarity in these model systems compared to the target tissues. For the evaluation of our findings, corresponding publications as well as related resource entries (GEO, Cellosaurus) are reviewed for possible reasons for misidentification or technical peculiarities.
Finally, we will provide reference copy number variation profiles for cancer cell lines as a resource linked to the Cellosaurus cell line database, to reduce the amount of erroneous cell line use in future research studies, and to provide additional information for the interpretation of data derived from these model systems.