Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Objectives: Interpreting skin findings can be challenging for both laypersons and clinicians. Large language models (LLMs) offer accessible decision support, yet their diagnostic capabilities for dermatological images remain underexplored. This study evaluated the diagnostic performance of LLMs based on image interpretation of common dermatological diseases.

Methods: A total of 500 dermatological images, encompassing four prevalent skin conditions (psoriasis, vitiligo, erysipelas and rosacea), were used to compare seven multimodal LLMs (GPT-4o, GPT-4o mini, Gemini 1.5 Pro, Gemini 1.5 Flash, Claude 3.5 Sonnet, Llama3.2 90B and 11B). A standardized prompt was used to generate one top diagnosis.

Results: The highest overall accuracy was achieved by GPT-4o (67.8 %), followed by GPT-4o mini (63.8 %) and Llama3.2 11B (61.4 %). Accuracy varied considerably across conditions, with psoriasis with the highest mean LLM accuracy of 59.2 % and erysipelas demonstrating the lowest accuracy (33.4 %). 11.0 % of all images were misdiagnosed by all LLMs, whereas 11.6 % were correctly diagnosed by all models. Correct diagnoses by all LLMs were linked to clear, disease-specific features, such as sharply demarcated erythematous plaques in psoriasis. Llama3.2 90B was the only LLM to decline diagnosing images, particularly those involving intimate areas of the body.

Conclusions: LLM performance varied significantly, emphasizing the need for cautious usage. Notably, a free, locally hostable model correctly identified the top diagnosis for approximately two-thirds of all images, demonstrating the potential for safer, locally deployed LLMs. Advancements in model accuracy and the integration of clinical metadata could further enhance accessible and reliable clinical decision support systems.

Download full-text PDF

Source
http://dx.doi.org/10.1515/dx-2025-0014DOI Listing

Publication Analysis

Top Keywords

large language
8
language models
8
decision support
8
dermatological images
8
conditions psoriasis
8
gpt-4o mini
8
llama32 90b
8
llms
6
images
5
accuracy
5

Similar Publications

Patient-reported outcomes after lobectomy vs. segmentectomy for early-stage non-small cell lung cancer.

Surg Endosc

September 2025

Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China.

Background: Surgical resection is the cornerstone for early-stage non-small cell lung cancer (NSCLC), with lobectomy historically standard. Evolving techniques have spurred debate comparing lobectomy and segmentectomy. This study analyzed early postoperative patient-reported symptoms and functional status in patients with early NSCLC undergoing either procedure.

View Article and Find Full Text PDF

Purpose: The study aims to compare the treatment recommendations generated by four leading large language models (LLMs) with those from 21 sarcoma centers' multidisciplinary tumor boards (MTBs) of the sarcoma ring trial in managing complex soft tissue sarcoma (STS) cases.

Methods: We simulated STS-MTBs using four LLMs-Llama 3.2-vison: 90b, Claude 3.

View Article and Find Full Text PDF

Background: Clinical communication is central to the delivery of effective, timely, and safe patient care. The use of text-based tools for clinician-to-clinician communication-commonly referred to as secure messaging-has increased exponentially over the past decade. The use of secure messaging has a potential impact on clinician work behaviors, workload, and cognitive burden.

View Article and Find Full Text PDF

Artificial Intelligence in allergy and immunology: recent developments, implementation challenges, and the road towards clinical impact.

J Allergy Clin Immunol

September 2025

University of Groningen, University Medical Center Groningen, Beatrix Children's Hospital, Department of Pediatric Pulmonology and Pediatric Allergology, Groningen, the Netherlands; University of Groningen, University Medical Center Groningen, Groningen Research Institute for Asthma and COPD (GRIAC)

Artificial intelligence (AI) is increasingly recognized for its capacity to transform medicine. While publications applying AI in allergy and immunology have increased, clinical implementation substantially lags behind other specialties. By mid-2024, over 1,000 FDA-approved AI-enabled medical devices existed, but none specifically addressed allergy and immunology.

View Article and Find Full Text PDF

[Artificial Intelligence Methods - a Perspective for Cardiovascular Telemedicine?].

Dtsch Med Wochenschr

September 2025

Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Charité Universitätsmedizin Berlin, Berlin, Deutschland.

Since 2022, an estimated 150000 to 200000 patients with heart failure (HF) in Germany have met the inclusion criteria for HF telemonitoring in accordance with the Federal Joint Committee's (G-BA) decision. Currently, only a few artificial intelligence (AI) applications are used in standard cardiovascular telemedicine care. However, AI applications could improve the predictive accuracy of existing telemedical sensor technology by recognising patterns across multiple data sources.

View Article and Find Full Text PDF