AI-Assisted Hypothesis Generation to Address Challenges in Cardiotoxicity Research: Simulation Study Using ChatGPT With GPT-4o.

Yilan Li , Tianshu Gu , Chengyuan Yang , Minghui Li , Congyi Wang , Lan Yao , Weikuan Gu , DianJun Sun

J Med Internet Res

The Second Affiliated Hospital of Harbin Medical University, Centre for Endemic Disease Control, Chinese Centre for Disease Control and Prevention, Harbin Medical University, Key Laboratory of Etiologic Epidemiology, Education Bureau of Heilongjiang Province & Ministry of Health, Harbin, China.

Published: May 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background: Cardiotoxicity is a major concern in heart disease research because it can lead to severe cardiac damage, including heart failure and arrhythmias.

Objective: This study aimed to explore the ability of ChatGPT with GPT-4o to generate innovative research hypotheses to address 5 major challenges in cardiotoxicity research: the complexity of mechanisms, variability among patients, the lack of detection sensitivity, the lack of reliable biomarkers, and the limitations of animal models.

Methods: ChatGPT with GPT-4o was used to generate multiple hypotheses for each of the 5 challenges. These hypotheses were then independently evaluated by 3 experts for novelty and feasibility. ChatGPT with GPT-4o subsequently selected the most promising hypothesis from each category and provided detailed experimental plans, including background, rationale, experimental design, expected outcomes, potential pitfalls, and alternative approaches.

Results: ChatGPT with GPT-4o generated 96 hypotheses, of which 13 (14%) were rated as highly novel and 62 (65%) as moderately novel. The average group score of 3.85 indicated a strong level of innovation in these hypotheses. Literature searching identified at least 1 relevant publication for 28 (29%) of the 96 hypotheses. The selected hypotheses included using single-cell RNA sequencing to understand cellular heterogeneity, integrating artificial intelligence with genetic profiles for personalized cardiotoxicity risk prediction, applying machine learning to electrocardiogram data for enhanced detection sensitivity, using multi-omics approaches for biomarker discovery, and developing 3D bioprinted heart tissues to overcome the limitations of animal models. Our group's evaluation of the 30 dimensions of the experimental plans for the 5 hypotheses selected by ChatGPT with GPT-4o revealed consistent strengths in the background, rationale, and alternative approaches, with most of the hypotheses (20/30, 67%) receiving scores of ≥4 in these areas. While the hypotheses were generally well received, the experimental designs were often deemed overly ambitious, highlighting the need for more practical considerations.

Conclusions: Our study demonstrates that ChatGPT with GPT-4o can generate innovative and potentially impactful hypotheses for overcoming critical challenges in cardiotoxicity research. These findings suggest that artificial intelligence-assisted hypothesis generation could play a crucial role in advancing the field of cardiotoxicity, leading to more accurate predictions, earlier detection, and better patient outcomes.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12123237	PMC
http://dx.doi.org/10.2196/66161	DOI Listing

Publication Analysis

Top Keywords

chatgpt gpt-4o

challenges cardiotoxicity

gpt-4o generate

hypotheses

hypothesis generation

generate innovative

detection sensitivity

limitations animal

experimental plans

background rationale

Similar Publications

A multi-model longitudinal assessment of ChatGPT performance on medical residency examinations.

Front Artif Intell

August 2025

Department of Biomedical Sciences, School of Health Sciences, State University of Rio Grande do Norte, Mossoró, Brazil.

Maria Eduarda Varela Cavalcanti Souto , Alexandre Chaves Fernandes , Ana Beatriz Santana Silva , Louise Helena de Freitas Ribeiro , Thales Allyrio Araújo de Medeiros Fernandes

Introduction: ChatGPT, a generative artificial intelligence, has potential applications in numerous fields, including medical education. This potential can be assessed through its performance on medical exams. Medical residency exams, critical for entering medical specialties, serve as a valuable benchmark.

View Article and Find Full Text PDF

Similar Publications

Assessing the Diagnostic Capabilities of ChatGPT-4 Omni in Grading Diabetic Retinopathy Fundoscopy Using Color Fundus Photographs.

Clin Ophthalmol

August 2025

University of Virginia School of Medicine, Charlottesville, VA, USA.

Nitin Chetla , Sai S Samayamanthula , Joseph He Chang , Arnold Y Leigh , Sinan Akosman

Purpose: Diabetic retinopathy (DR) is a leading cause of vision loss in working-age adults. Despite the importance of early DR detection, only 60% of patients with diabetes receive recommended annual screenings due to limited eye care provider capacity. FDA-approved AI systems were developed to meet the growing demand for DR screening; however, high costs and specialized equipment limit accessibility.

View Article and Find Full Text PDF

Similar Publications

Interpreting BI-RADS-Free Breast MRI Reports Using a Large Language Model: Automated BI-RADS Classification From Narrative Reports Using ChatGPT.

Acad Radiol

September 2025

Department of Radiology, Başakşehir Çam and Sakura City Hospital, Istanbul, Turkey (E.E.).

Deniz Esin Tekcan Sanli , Ahmet Necati Sanli , Gizem Ozmen , Aycil Ozmen , Irem Cihan

Purpose: This study aimed to evaluate the performance of ChatGPT (GPT-4o) in interpreting free-text breast magnetic resonance imaging (MRI) reports by assigning BI-RADS categories and recommending appropriate clinical management steps in the absence of explicitly stated BI-RADS classifications.

Methods: In this retrospective, single-center study, a total of 352 documented full-text breast MRI reports of at least one identifiable breast lesion with descriptive imaging findings between January 2024 and June 2025 were included in the study. Incomplete reports due to technical limitations, reports describing only normal findings, and MRI examinations performed at external institutions were excluded from the study.

View Article and Find Full Text PDF

Similar Publications

Performance of ChatGPT, Gemini and DeepSeek for non-critical triage support using real-world conversations in emergency department.

BMC Emerg Med

September 2025

Department of Emergency Medicine, Korea University Ansan Hospital, Ansan-si, 15355, Republic of Korea.

Sukyo Lee , Sumin Jung , Jong-Hak Park , Hanjin Cho , Sungwoo Moon

Background: Timely and accurate triage is crucial for the emergency department (ED) care. Recently, there has been growing interest in applying large language models (LLMs) to support triage decision-making. However, most existing studies have evaluated these models using simulated scenarios rather than real-world clinical cases.

View Article and Find Full Text PDF

Similar Publications

Comparative evaluation of AI platforms "Google Gemini 2.5 Flash, Google Gemini 2.0 Flash, DeepSeek V3 and ChatGPT 4o" in solving multiple-choice questions from different subtopics of anatomy.

Surg Radiol Anat

August 2025

Department of Radiodiagnosis, Gandhi Medical College, Bhopal, Madhya Pradesh, India.

Anjali Singal , Swati Goyal

Purpose: The rise of artificial intelligence (AI) based large language models (LLMs) had a profound impact on medical education. Given the widespread use of multiple-choice questions (MCQs) in anatomy education, it is likely that such queries are commonly directed to AI tools. The current study compared the accuracy level of different AI platforms for solving MCQs from various subtopics in Anatomy.

View Article and Find Full Text PDF

Similar Publications