Data-Driven Information Extraction and Enrichment of Molecular Profiling Data for Cancer Cell Lines

Literature-derived annotations as entry point for data exploration

Ellery Smith, Rahel Paloots, Dimitris Giagkos, Michael Baudis and Kurt Stockinger

Bioinformatics Advances, vbae045,
Previous arXiv preprint (2023-07-03):

arXiv logo arXiv logo Motivation: With the proliferation of research means and computational methodologies, published biomedical literature is growing exponentially in numbers and volume (Lubowitz et al., 2021). As a consequence, in the fields of biological, medical and clinical research, domain experts have to sift through massive amounts of scientific text to find relevant information. However, this process is extremely tedious and slow to be performed by humans. Hence, novel computational information extraction and correlation mechanisms are required to boost meaningful knowledge extraction. Results: In this work, we present the design, implementation and application of a novel data extraction and exploration system. This system extracts deep semantic relations between textual entities from scientific literature to enrich existing structured clinical data in the domain of cancer cell lines. We introduce a new public data exploration portal, which enables automatic linking of genomic copy number variants plots with ranked, related entities such as affected genes. Each relation is accompanied by literature-derived evidences, allowing for deep, yet rapid, literature search, using existing structured data as a springboard.

Availability and Implementation: Our system is publicly available on the web at

Contact: The authors can be contacted at or