Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: Analysis of Electronic Health Records (EHRs) is crucial in real-world evidence (RWE), especially in oncology, as it provides valuable insights into the complex nature of the disease. The implementation of advanced techniques for automated extraction of structured information from textual data potentially enables access to expert knowledge in highly specialized contexts. In this paper, we introduce MISTIC, a Natural Language Processing (NLP) approach to classify the presence or absence of metastasis in Italian EHRs, in the breast cancer domain.

Methods: Our approach consists of a transformer-based framework designed for few-shot learning, requiring a small labelled dataset and minimal computational resources for training. The pipeline includes text segmentation to improve model processing and topic analysis to filter informative content, ensuring relevant input data for classification.

Results: MISTIC was evaluated across multiple data sources, and compared to several benchmark methodologies, ranging from a pattern-matching system, composed of regex and semantic rules, to BERT-based models implemented in a zero-shot learning setup and Large Language Models (LLMs). The results demonstrate the generalization of our approach, achieving an F-Score above 87% on all the sources, and outperforming the other experiments, with an overall F-Score of 91.2%.

Conclusions: MISTIC achieves high performance in the Italian metastasis classification task, outperforming rule-based systems, zero-shot BERT models, and LLMs. Its few-shot learning setup offers a computationally efficient alternative to large-scale models, while its segmentation and topic analysis steps enhance explainability by explicitly linking predictions to key textual elements. Furthermore, MISTIC demonstrates strong generalization across different data sources, reinforcing its potential as a scalable and transparent solution for clinical text classification. By extracting high-quality metastatic information from diverse textual data, MISTIC supports medical researchers in analyzing unstructured and highly informative content across a wide range of medical reports. In doing so, it enhances data accessibility and interpretability, addressing a critical gap in health informatics and clinical practice.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11987267PMC
http://dx.doi.org/10.1186/s12911-025-02994-wDOI Listing

Publication Analysis

Top Keywords

metastasis classification
8
electronic health
8
health records
8
textual data
8
few-shot learning
8
topic analysis
8
informative content
8
data sources
8
learning setup
8
models llms
8

Similar Publications

Objectives: The 9th edition of the Tumor, Node, Metastasis (TNM-9) lung cancer classification is set to replace the 8th edition (TNM-8) starting in 2025. Key updates include the splitting of the mediastinal nodal category N2 into single- and multiple-station involvement, as well as the classification of multiple extrathoracic metastatic lesions as involving a single organ system (M1c1) or multiple organ systems (M1c2). This study aimed to assess how the TNM-9 revisions affect the final staging of lung cancer patients and how these changes correlate with overall survival (OS).

View Article and Find Full Text PDF

Background: Staging laparoscopy (SL) is an essential procedure for peritoneal metastasis (PM) detection. Although surgeons are expected to differentiate between benign and malignant lesions intraoperatively, this task remains difficult and error-prone. The aim of this study was to develop a novel multimodal machine learning (MML) model to differentiate PM from benign lesions by integrating morphologic characteristics with intraoperative SL images.

View Article and Find Full Text PDF

Background: Tumor deposit (TD) is an independent risk factor associated with recurrence or metastasis for patients with colorectal cancer (CRC). The scenario in which both TD and lymph node metastasis (LNM) are positive is not clearly illustrated by the current TNM staging system. Simply treating one TD as one or two LNMs by a weighting factor is inappropriate.

View Article and Find Full Text PDF

Unravelling novel microbial players in the breast tissue of TNBC patients: a meta-analytic perspective.

NPJ Biofilms Microbiomes

September 2025

Bioinformatics Group, Centre for Informatics Science (CIS), School of Information Technology and Computer Science (ITCS), Nile University, Giza, Egypt.

Triple-negative breast cancer (TNBC) is the most aggressive subtype of breast cancer (BC), accounting for nearly 40% of BC-related deaths. Emerging evidence suggests that the breast tissue microbiome harbors distinct microbial communities; however, the microbiome specific to TNBC remains largely unexplored. This study presents the first comprehensive meta-analysis of the TNBC tissue microbiome, consolidating 16S rRNA amplicon sequencing data from 200 BC samples across four independent cohorts.

View Article and Find Full Text PDF