98%
921
2 minutes
20
Introduction: Manual ICD-10 coding of German clinical texts is time-consuming and error-prone. This project aims to develop a semi-automated pipeline for efficient coding of unstructured medical documentation.
State Of The Art: Existing approaches often rely on fine-tuned language models that require large datasets and perform poorly on rare codes, particularly in low-resource languages such as German.
Concept: The proposed system integrates Named Entity Recognition, semantic and lexical retrieval, abbreviation resolution, and context-aware normalization within a Retrieval-Augmented Generation (RAG) framework using a compact generative model.
Implementation: The pipeline utilizes Sentence-BERT embeddings, FAISS indexing, and the Mistral-Small-Instruct model. ICD codes are assigned through a combination of semantic similarity and generative refinement among the top retrieval candidates.
Lessons Learned: Major sources of error were found in semantic retrieval and diagnosis normalization. Future improvements should focus on domain-specific German embeddings, more robust abbreviation handling, and enhanced context-aware prompting to increase accuracy and usability in clinical environments.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.3233/SHTI251397 | DOI Listing |
J Med Internet Res
September 2025
School of Governance and Policy Science, The Chinese University of Hong Kong, Hong Kong, China (Hong Kong).
Background: Older adults are more vulnerable to severe consequences caused by seasonal influenza. Although seasonal influenza vaccination (SIV) is effective and free vaccines are available, the SIV uptake rate remained inadequate among people aged 65 years or older in Hong Kong, China. There was a lack of studies evaluating ChatGPT in promoting vaccination uptake among older adults.
View Article and Find Full Text PDFJCO Clin Cancer Inform
September 2025
Department of Applied AI and Data Science, City of Hope, Duarte, CA.
Purpose: The recent advancements of retrieval-augmented generation (RAG) and large language models (LLMs) have revolutionized the extraction of real-world evidence from unstructured electronic health records (EHRs) in oncology. This study aims to enhance RAG's effectiveness by implementing a retriever encoder specifically designed for oncology EHRs, with the goal of improving the precision and relevance of retrieved clinical notes for oncology-related queries.
Methods: Our model was pretrained with more than six million oncology notes from 209,135 patients at City of Hope.
Biomedical named entity recognition (NER) is a high-utility natural language processing (NLP) task, and large language models (LLMs) show promise particularly in few-shot settings (i.e., limited training data).
View Article and Find Full Text PDFJ Clin Neurosci
September 2025
Nordwest-Krankenhaus Sanderbusch, Friesland Kliniken gGmbH, Department of Neurosurgery, Sande, Germany. Electronic address:
Background: Large language models (LLMs), with their remarkable ability to retrieve and analyse the information within seconds, are generating significant interest in the domain of healthcare. This study aims to assess and compare the accuracy, completeness, and usefulness of the responses of Gemini Advanced, ChatGPT-3.5, and ChatGPT-4, in neuro-oncology cases.
View Article and Find Full Text PDFFront Artif Intell
August 2025
Bayer Research and Development, Pharmaceuticals, Preclinical Development, Berlin, Germany.
The pharmaceutical industry faces pressure to improve the drug development process while reducing costs in an evolving regulatory landscape. This paper presents the Preclinical Information Center (PRINCE), a cloud-hosted data integration platform developed by Bayer AG in collaboration with Thoughtworks. PRINCE integrates decades of structured and unstructured safety study reports, leveraging a multi-agent architecture based on Large Language Models (LLMs) and advanced data retrieval methodologies, such as Retrieval-Augmented Generation and Text-to-SQL.
View Article and Find Full Text PDF