Integrating large language models in systematic reviews: a framework and case study using ROBINS-I for risk of bias assessment.

Bashar Hasan , Samer Saadi , Noora S Rajjoub , Moustafa Hegazi , Mohammad Al-Kordi , Farah Fleti , Magdoleen Farah , Irbaz B Riaz , Imon Banerjee , Zhen Wang , Mohammad Hassan Murad

BMJ Evid Based Med

Kern Center for the Science of Healthcare Delivery, Mayo Clinic, Rochester, Minnesota, USA.

Published: November 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Large language models (LLMs) may facilitate and expedite systematic reviews, although the approach to integrate LLMs in the review process is unclear. This study evaluates GPT-4 agreement with human reviewers in assessing the risk of bias using the Risk Of Bias In Non-randomised Studies of Interventions (ROBINS-I) tool and proposes a framework for integrating LLMs into systematic reviews. The case study demonstrated that raw per cent agreement was the highest for the ROBINS-I domain of 'Classification of Intervention'. Kendall agreement coefficient was highest for the domains of 'Participant Selection', 'Missing Data' and 'Measurement of Outcomes', suggesting moderate agreement in these domains. Raw agreement about the overall risk of bias across domains was 61% (Kendall coefficient=0.35). The proposed framework for integrating LLMs into systematic reviews consists of four domains: rationale for LLM use, protocol (task definition, model selection, prompt engineering, data entry methods, human role and success metrics), execution (iterative revisions to the protocol) and reporting. We identify five basic task types relevant to systematic reviews: selection, extraction, judgement, analysis and narration. Considering the agreement level with a human reviewer in the case study, pairing artificial intelligence with an independent human reviewer remains required.

Download full-text PDF	Source
http://dx.doi.org/10.1136/bmjebm-2023-112597	DOI Listing

Publication Analysis

Top Keywords

systematic reviews

risk bias

case study

large language

language models

framework integrating

integrating llms

llms systematic

human reviewer

agreement

Similar Publications

Prevalence of biofilm in chronic wounds: systematic review with meta-analysis.

Wounds

August 2025

Department of Nursing, Federal University of Ceará, Ceará, Brazil.

Manuela de Mendonça Figueirêdo Coelho , Beatriz Moreira Alves Avelino , Beatriz Alves de Oliveira , Mariana Araújo Rios , Fabiane do Amaral Gubert

Background: To estimate the prevalence of biofilms in chronic wounds.

Methods: The authors performed a systematic review of prevalence studies and meta-analysis, structured according to the Preferred Reporting Items for Systematic reviews and Meta-Analyses guidelines. Articles were searched in Scopus (Elsevier), Web of Science (Clarivate), MEDLINE/PubMed (National Institutes of Health), and Embase (Elsevier) databases.

View Article and Find Full Text PDF

Similar Publications

Is the Time Right to Start Testing if Late-Onset Depression is Associated With the Development of an α-Synucleinopathy Like Parkinson's Disease? A Double Systematic Review and Meta-Analysis.

Am J Geriatr Psychiatry

August 2025

Department of Psychiatry (MLO, SEC, JZ, KS), Amsterdam UMC, University of Amsterdam, Amsterdam Neuroscience, Amsterdam, The Netherlands; Neuroimmunology Research Group (KS), Netherlands Institute for Neuroscience, Amsterdam, The Netherlands; Psychiatric Program of the Netherlands Brain Bank (KS), Ne

Jan Booij , Rik Schalbroeck , Madeleine Wartenhorst , Carmen F M van Hooijdonk , Youssef Chahid

Parkinson's disease (PD) is characterized by two neurobiological markers: pathological α-synuclein and/or a dopaminergic deficit. Depression is common in PD, and may precede motor signs, particularly in late-onset depression (LOD). We conducted two systematic reviews and a meta-analysis to examine the relationship between depression and PD development.

View Article and Find Full Text PDF

Similar Publications

Life Experiences of Women Diagnosed With Cervical Cancer in Sub-Saharan Africa: A Systematic Review of Qualitative Studies.

Psychooncology

September 2025

Department of Clinical Nursing, Muhimbili University of Health and Allied Sciences, Dar es Salaam, Tanzania.

Emmanuel Z Chona , Rashid A Gosse , Emanueli Amosi Msengi , Therese Polin Aruldhas , Lutengano Mkonongo

Background: Sub-Saharan Africa (SSA) bears the highest global burden of cervical cancer. Living with the disease is a complex experience, leading to significant changes across various biopsychosocial dimensions, which in turn affect the quality of life of affected women.

Aims: This review aimed to synthesize available scientific evidence on the life experiences of women diagnosed with cervical cancer in SSA in order to generate valuable insights into the care of the affected population.

View Article and Find Full Text PDF

Similar Publications

The renal baroreflex: A systematic review and meta-analysis in healthy and hypertensive animals.

Physiol Rep

September 2025

Department of Obstetrics and Gynecology, Radboud University Medical Center, Nijmegen, The Netherlands.

Maaike van Ochten , Wisal El Fathi , Esmee M E Bovee , Marc E A Spaanderman , Carlijn R Hooijmans

The renal baroreflex describes the dose-dependent relation between renal pressure and renin release. Former studies have approximated this relation through animal experiments, but the exact shape of the response curve and its alteration by hypertension remain unclear. Therefore, we conducted a systematic review and meta-analysis on the renal baroreflex in healthy and hypertensive animals.

View Article and Find Full Text PDF

Similar Publications

Reported use of implementation science theories, models, and frameworks in 151 implementation trials: secondary analysis of a systematic review targeting nursing practice.

Transl Behav Med

January 2025

Ingram School of Nursing, Faculty of Medicine and Health Sciences, McGill University, Montréal, Canada.

Charlene Weight , Rachael Laritz , Simonne E Collins , Meagan Mooney , Billy Vinette

Background: Theories, models, and frameworks (TMFs) are central to the development and evaluation of implementation strategies supporting evidence-based practice (EBP). However, evidence on how and to what extent TMFs are used in implementation trials remains limited.

Purpose: This study aimed to examine the nature and extent of TMF use in implementation trials, identify which TMFs are most frequently employed, and explore temporal trends in their use.

View Article and Find Full Text PDF

Similar Publications